Claw Code 쿼리 엔진: AI 에이전트의 대화 오케스트레이션 방식

QueryEngineConfig

Claw Code 쿼리 엔진은 a QueryEngineConfig 데이터클래스를 통해 설정되며, 모든 대화 세션의 핵심 동작 제한을 제어합니다. 기본 설정 값은 실용적인 코딩 워크플로에 맞게 조정되어 있습니다:

class QueryEngineConfig:
    max_turns: int = 8
    max_budget_tokens: int = 2000
    compact_after_turns: int = 12
    structured_output: bool = False
    structured_retry_limit: int = 2
      

max_turns caps the number of agent turns per submit_message call at 8, preventing runaway tool-use loops. max_budget_tokens sets a hard ceiling of 2,000 tokens on total output generation per submission. compact_after_turns triggers automatic message compaction after 12 accumulated turns, keeping the context window manageable. structured_output controls whether the engine expects a structured JSON response, and structured_retry_limit allows up to 2 retries when structured output parsing fails.

TurnResult

Claw Code 쿼리 엔진의 각 턴은 TurnResult를 생성하여 해당 턴에서 발생한 모든 것을 캡처합니다. 구조에는 7개의 필드가 포함됩니다:

prompt — 턴을 시작한 입력 메시지
output — 에이전트가 생성한 응답 텍스트
matched_commands — 감지되어 실행된 슬래시 명령어 (like /compact or /clear) that were detected and executed
matched_tools — 이 턴에서 에이전트가 호출한 도구
permission_denials — 권한 시스템에 의해 차단된 도구 호출
usage — a UsageSummary containing input and output token counts
stop_reason — 턴이 종료된 이유: completed, max_turns_reached, or max_budget_reached

The stop_reason field is critical for the caller to understand whether the agent finished naturally or was cut short. When max_turns_reached fires, the agent may have been mid-task and the caller can decide whether to continue with another submission. When max_budget_reached fires, the session has consumed its allocated token budget.

QueryEnginePort

QueryEnginePort는 완전한 대화 세션을 관리하는 상태 기반 엔진 인스턴스입니다. It maintains several pieces of mutable state:

session_id — a UUID hex string that uniquely identifies this session across restarts
mutable_messages — the live message history that grows with each turn
permission_denials — accumulated list of tools the agent tried to use but was blocked from
total_usage — running token usage totals across all turns in this session
transcript_store — a TranscriptStore instance that persists the conversation for replay and audit

The engine is created through two factory methods. from_workspace() creates a fresh session bootstrapped from the current workspace context (discovering CLAUDE.md files, scanning for project structure, assembling the tool pool). from_saved_session(session_id) restores a previously persisted session, rehydrating the message history, usage counters, and transcript from the .port_sessions/ directory.

세션 식별

session_id는 UUID 16진 문자열(32자, 대시 없음)입니다. 이 컴팩트한 형식은 파일 경로와 API 헤더에 사용되어, 경로 구분자 문제 없이 세션을 쉽게 참조하고 저장할 수 있습니다.

submit_message와 스트리밍

The primary entry point is submit_message, which takes a user message and runs the agent through a multi-turn loop. On each iteration, the engine sends the accumulated messages to the model, processes the response (executing any tool calls), appends the results to mutable_messages, and checks the stopping conditions. The loop terminates when one of three conditions is met:

completed — the agent produced a final response without requesting any tool calls
max_turns_reached — the turn counter hit max_turns (default 8)
max_budget_reached — cumulative output tokens exceeded max_budget_tokens (default 2,000)

For real-time output, stream_submit_message wraps the same logic in a generator that yields incremental dictionaries. The stream events follow a defined protocol:

# Stream event types
"message_start"       # New assistant message beginning
"command_match"       # Slash command detected and executed
"tool_match"          # Tool invocation started
"permission_denial"  # Tool blocked by permission system
"message_delta"       # Incremental text chunk
"message_stop"        # Turn completed with stop_reason
      

This streaming protocol enables responsive UIs that show tool invocations, permission denials, and text generation in real time as they occur, rather than waiting for the entire turn to complete.

메시지 압축

Long-running sessions accumulate messages that eventually threaten to overflow the model's context window. The claw code query engine handles this through automatic compaction via the compact_messages_if_needed method.

When the accumulated message count exceeds compact_after_turns (default 12), the engine trims the history to retain only the last compact_after_turns messages. Older messages are summarized into a compact system message that preserves key context — file paths, decisions made, and ongoing work — without the full verbatim history.

The compaction process strips analysis tags, collects key file references, and generates a concise summary that fits within the model's context budget. This automatic trimming means sessions can run for dozens or hundreds of turns without manual intervention.

세션 영속성

The persist_session method flushes the current session state to disk. It writes the transcript to the TranscriptStore and saves session metadata (message history, usage counters, session ID) to the .port_sessions/ directory. This enables two key workflows:

Session resume — restoring a session after a crash, network interruption, or intentional pause using from_saved_session(session_id)
Audit trail — reviewing what the agent did, what tools it invoked, and what permissions were denied during a completed session

런타임 연동

The query engine does not operate in isolation — it is integrated with the broader claw code runtime through two key components:

PortRuntime.route_prompt tokenizes the incoming prompt and scores it against routing rules, determining how the message should be handled (direct response, tool-augmented response, or delegation to a sub-agent). The bootstrap_session method assembles the full context — system prompt, CLAUDE.md instructions, tool definitions, and workspace metadata — before the first turn begins.

Rust ConversationRuntime<C,T> provides a native alternative for high-performance scenarios. It is generic over two traits: ApiClient (the model interface) and ToolExecutor (the tool dispatch layer). The Rust runtime enforces a max_iterations limit of 16 and runs a tight tool-use loop: send messages to the model, parse tool-use blocks from the response, execute tools, append results, repeat until the model produces a final answer or hits the iteration cap.

TranscriptStore

The TranscriptStore is the in-memory persistence layer for conversation history. It provides four operations:

append — adds a new message (user, assistant, or tool result) to the transcript
compact(keep_last=10) — trims the transcript to the most recent 10 entries, discarding older history
replay — returns the full transcript as a tuple for session restoration
flush — persists the in-memory transcript to disk, clearing the write buffer

The compact method with its keep_last=10 default provides a secondary compaction layer independent of the query engine's compact_after_turns setting. This two-tier approach ensures that both the model's context window (managed by the engine) and the persistent storage (managed by the transcript store) remain bounded.

아키텍처 연결

쿼리 엔진은 Claw Code 아키텍처의 중심에 위치하며, 사용자 대면 CLI, 모델 API, 도구 시스템, 세션 영속성 레이어를 연결합니다. 쿼리 엔진을 이해하는 것은 전체 에이전트 하네스가 어떻게 작동하는지 이해하는 데 필수적입니다.