Claw Code 查詢引擎：AI 代理如何協作對話

QueryEngineConfig

The claw code query engine is configured through a QueryEngineConfig dataclass that controls the core behavioral limits of every conversation session. The default configuration values are tuned for practical coding workflows:

class QueryEngineConfig:
    max_turns: int = 8
    max_budget_tokens: int = 2000
    compact_after_turns: int = 12
    structured_output: bool = False
    structured_retry_limit: int = 2
      

max_turns caps the number of agent turns per submit_message call at 8, preventing runaway tool-use loops. max_budget_tokens sets a hard ceiling of 2,000 tokens on total output generation per submission. compact_after_turns triggers automatic message compaction after 12 accumulated turns, keeping the context window manageable. structured_output controls whether the engine expects a structured JSON response, and structured_retry_limit allows up to 2 retries when structured output parsing fails.

TurnResult

Each turn in the claw code query engine produces a TurnResult that captures everything that happened during that turn. The structure contains seven fields:

prompt — the input message that initiated the turn
output — the agent's generated response text
matched_commands — any slash commands (like /compact or /clear) that were detected and executed
matched_tools — the tools that the agent invoked during this turn
permission_denials — any tool invocations that were blocked by the permission system
usage — a UsageSummary containing input and output token counts
stop_reason — why the turn ended: completed, max_turns_reached, or max_budget_reached

The stop_reason field is critical for the caller to understand whether the agent finished naturally or was cut short. When max_turns_reached fires, the agent may have been mid-task and the caller can decide whether to continue with another submission. When max_budget_reached fires, the session has consumed its allocated token budget.

QueryEnginePort

The QueryEnginePort is the stateful engine instance that manages a complete conversation session. It maintains several pieces of mutable state:

session_id — a UUID hex string that uniquely identifies this session across restarts
mutable_messages — the live message history that grows with each turn
permission_denials — accumulated list of tools the agent tried to use but was blocked from
total_usage — running token usage totals across all turns in this session
transcript_store — a TranscriptStore instance that persists the conversation for replay and audit

The engine is created through two factory methods. from_workspace() creates a fresh session bootstrapped from the current workspace context (discovering CLAUDE.md files, scanning for project structure, assembling the tool pool). from_saved_session(session_id) restores a previously persisted session, rehydrating the message history, usage counters, and transcript from the .port_sessions/ directory.

工作階段身份

The session_id is a UUID hex string (32 characters, no dashes). This compact format is used in file paths and API headers, making sessions easy to reference and store without path-separator issues.

submit_message and Streaming

The primary entry point is submit_message, which takes a user message and runs the agent through a multi-turn loop. On each iteration, the engine sends the accumulated messages to the model, processes the response (executing any tool calls), appends the results to mutable_messages, and checks the stopping conditions. The loop terminates when one of three conditions is met:

completed — the agent produced a final response without requesting any tool calls
max_turns_reached — the turn counter hit max_turns (default 8)
max_budget_reached — cumulative output tokens exceeded max_budget_tokens (default 2,000)

For real-time output, stream_submit_message wraps the same logic in a generator that yields incremental dictionaries. The stream events follow a defined protocol:

# Stream event types
"message_start"       # New assistant message beginning
"command_match"       # Slash command detected and executed
"tool_match"          # Tool invocation started
"permission_denial"  # Tool blocked by permission system
"message_delta"       # Incremental text chunk
"message_stop"        # Turn completed with stop_reason
      

This streaming protocol enables responsive UIs that show tool invocations, permission denials, and text generation in real time as they occur, rather than waiting for the entire turn to complete.

訊息壓縮

Long-running sessions accumulate messages that eventually threaten to overflow the model's context window. The claw code query engine handles this through automatic compaction via the compact_messages_if_needed method.

When the accumulated message count exceeds compact_after_turns (default 12), the engine trims the history to retain only the last compact_after_turns messages. Older messages are summarized into a compact system message that preserves key context — file paths, decisions made, and ongoing work — without the full verbatim history.

The compaction process strips analysis tags, collects key file references, and generates a concise summary that fits within the model's context budget. This automatic trimming means sessions can run for dozens or hundreds of turns without manual intervention.

工作階段持久化

The persist_session method flushes the current session state to disk. It writes the transcript to the TranscriptStore and saves session metadata (message history, usage counters, session ID) to the .port_sessions/ directory. This enables two key workflows:

Session resume — restoring a session after a crash, network interruption, or intentional pause using from_saved_session(session_id)
Audit trail — reviewing what the agent did, what tools it invoked, and what permissions were denied during a completed session

執行環境整合

The query engine does not operate in isolation — it is integrated with the broader claw code runtime through two key components:

PortRuntime.route_prompt tokenizes the incoming prompt and scores it against routing rules, determining how the message should be handled (direct response, tool-augmented response, or delegation to a sub-agent). The bootstrap_session method assembles the full context — system prompt, CLAUDE.md instructions, tool definitions, and workspace metadata — before the first turn begins.

Rust ConversationRuntime<C,T> provides a native alternative for high-performance scenarios. It is generic over two traits: ApiClient (the model interface) and ToolExecutor (the tool dispatch layer). The Rust runtime enforces a max_iterations limit of 16 and runs a tight tool-use loop: send messages to the model, parse tool-use blocks from the response, execute tools, append results, repeat until the model produces a final answer or hits the iteration cap.

TranscriptStore

The TranscriptStore is the in-memory persistence layer for conversation history. It provides four operations:

append — adds a new message (user, assistant, or tool result) to the transcript
compact(keep_last=10) — trims the transcript to the most recent 10 entries, discarding older history
replay — returns the full transcript as a tuple for session restoration
flush — persists the in-memory transcript to disk, clearing the write buffer

The compact method with its keep_last=10 default provides a secondary compaction layer independent of the query engine's compact_after_turns setting. This two-tier approach ensures that both the model's context window (managed by the engine) and the persistent storage (managed by the transcript store) remain bounded.

架構連接

The query engine sits at the center of the claw code architecture, bridging the user-facing CLI, the model API, the tool system, and the session persistence layer. Understanding the query engine is essential for understanding how the entire agent harness operates.