Claw Code Architecture: Python & Rust Agent Harness Design

Architecture Overview

The claw code architecture follows a dual-layer design: a Python orchestration layer that manages agent sessions, command routing, and LLM interaction, paired with a Rust performance layer that handles API communication, tool execution, terminal rendering, and security. This separation keeps the high-level agent logic expressive and easy to modify while pushing latency-sensitive operations into compiled, memory-safe code.

At the highest level, every user interaction flows through three stages: bootstrap (environment discovery, configuration loading, mode routing), query engine execution (turn loops with tool-calling and message compaction), and response rendering (streaming markdown with syntax highlighting in the terminal). Each stage is backed by well-defined modules in both layers.

 User Terminal (REPL / stdin)
       |
       v
 +------------------------------------------+
 |          Rust CLI Binary                  |
 |  rusty-claude-cli  (crossterm, syntect,   |
 |  pulldown_cmark, braille spinner)         |
 +------------------------------------------+
       |
       v
 +------------------------------------------+
 |         Python Orchestration Layer        |
 |  bootstrap_graph.py  (7 stages)          |
 |  runtime.py  (route_prompt, run_turn)    |
 |  query_engine.py  (max_turns=8, stream)  |
 |  commands.py  |  tools.py  |  models.py  |
 |  context.py   |  session_store.py        |
 |  transcript.py | execution_registry.py   |
 |  tool_pool.py  | parity_audit.py         |
 |  + 50 more modules                       |
 +------------------------------------------+
       |              |              |
       v              v              v
 +-----------+  +-----------+  +-----------+
 | Rust api  |  | Rust      |  | Rust      |
 | crate     |  | runtime   |  | tools     |
 | Anthropic |  | 16 modules|  | 19 specs  |
 | client,   |  | bash,file |  | JSON      |
 | SSE,OAuth |  | ops,mcp,  |  | schemas   |
 | retry     |  | oauth,    |  |           |
 |           |  | session,  |  |           |
 |           |  | prompt,   |  |           |
 |           |  | usage     |  |           |
 +-----------+  +-----------+  +-----------+
       |              |
       v              v
 +-----------+  +---------------+
 | Rust      |  | Rust          |
 | commands  |  | compat-       |
 | 15 slash  |  | harness       |
 | commands  |  | (bridge       |
 |           |  |  layer)       |
 +-----------+  +---------------+
       |
       v
   Anthropic API  (api.anthropic.com)

Python Orchestration Layer (src/)

The Python workspace contains 60+ modules organized into tightly scoped subsystems. Each module owns a single responsibility, making the codebase easy to navigate and test in isolation. Below is a detailed breakdown of every major subsystem.

Bootstrap Graph — bootstrap_graph.py

The bootstrap sequence is the entry point for every claw-code session. It runs through 7 sequential stages, each one gated on the previous stage completing successfully:

Prefetch — Pre-loads reference data and warms caches before anything else runs.
Warning handler — Attaches global warning filters so noisy deprecation messages from third-party libraries do not leak into the user terminal.
CLI parser — Parses command-line arguments and flags, establishing the execution mode early.
Setup + Commands parallel load — Runs environment setup and command snapshot loading concurrently for faster startup.
Deferred init — Initializes components that depend on the parsed CLI state and loaded configuration.
Mode routing — Routes execution to one of six modes: local, remote, ssh, teleport, direct-connect, or deep-link.
Query engine submit loop — Hands control to the query engine for the interactive turn loop.

Query Engine — query_engine.py

The query engine is the central orchestration hub. It manages the conversation loop between the user, the LLM, and the tool system. Key configuration values are defined in the QueryEngineConfig dataclass:

max_turns = 8 — The maximum number of LLM round-trips per user query before the engine halts.
max_budget_tokens = 2000 — Token budget cap for a single query session.
compact_after_turns = 12 — After this many accumulated turns, the transcript is compacted to free context window space.

Each turn produces a TurnResult, and the QueryEnginePort class exposes session management, message compaction, and streaming. The streaming interface yields six distinct event types: message_start, command_match, tool_match, permission_denial, message_delta, and message_stop. This event-driven design allows the terminal renderer to paint output incrementally as tokens arrive.

Runtime — runtime.py

The runtime module bridges raw user input and the query engine. It provides PortRuntime, which exposes two critical methods:

route_prompt — Tokenizes user input and scores it against available commands and tools, producing RoutedMatch objects (each with a kind, name, source_hint, and score).
bootstrap_session / run_turn_loop — Initializes a new agent session and enters the main execution loop.

Commands — commands.py

The command inventory loads from reference_data/commands_snapshot.json at startup. It exposes the CommandExecution dataclass with five fields: name, source_hint, prompt, handled, and message. Four public functions provide the command API:

load_command_snapshot() — Reads and parses the JSON snapshot file.
get_command(name) — Returns a single command definition by name.
find_commands(query) — Fuzzy-searches commands matching a query string.
execute_command(cmd) — Dispatches a command for execution.

Tools — tools.py

The tool inventory mirrors the command system but for tool-calling. It loads from reference_data/tools_snapshot.json and defines a ToolExecution dataclass. Key functions include:

load_tool_snapshot() — Cached with lru_cache for zero-cost repeated access.
build_tool_backlog() — Constructs the list of pending tool operations.
get_tools(simple_mode) — When simple_mode is enabled, restricts the available tools to only BashTool, FileReadTool, and FileEditTool.
filter_tools_by_permission_context() — Applies permission filtering based on the current session context.

Core Data Models — models.py

The models.py module defines the shared data types that flow through every layer of the system:

Dataclass	Fields	Purpose
`Subsystem`	name, path, file_count, notes	Represents a discovered subsystem in the workspace
`PortingModule`	name, responsibility, source_hint, status	Tracks porting progress for a single module
`PermissionDenial`	—	Records when a tool call is blocked by permissions
`UsageSummary`	input_tokens, output_tokens (word-count proxy)	Token usage tracking for cost management
`PortingBacklog`	—	Aggregates all pending porting work

Supporting Python Modules

Beyond the core subsystems, the Python layer includes dozens of focused modules:

context.py — Workspace discovery via PortContext, which locates source_root, tests_root, assets_root, archive_root, and counts Python files.
session_store.py — JSON-based session persistence stored in the .port_sessions/ directory.
transcript.py — In-memory conversation transcript with a compaction strategy that keeps the last 10 messages (keep_last=10).
execution_registry.py — Unified registry exposing MirroredCommand and MirroredTool types that bridge the Python and Rust execution models.
tool_pool.py — Filtered tool assembly via the ToolPool class, which caps visible tools at 15 to avoid overwhelming the LLM context.
parity_audit.py — Compares the Python and TypeScript implementations: 18 root file mappings and 31 directory mappings to track porting completeness.
cost_tracker.py — Accumulates token usage and dollar costs across turns.
history.py — Session history storage and retrieval.
ink.py — Markdown panel rendering for rich terminal output.
permissions.py — Permission checks and policy enforcement.
prefetch.py — Data prefetching for faster bootstrap.
remote_runtime.py — Handles remote execution modes.
direct_modes.py — Direct-connect and deep-link mode handling.
command_graph.py — Command dependency graph resolution.
query.py — Lower-level query construction helpers.
deferred_init.py — Lazy initialization for heavy components.

Rust Performance Layer (rust/)

The Rust workspace is organized as a 6-crate Cargo workspace. Each crate is compiled independently, enabling incremental builds and clear dependency boundaries. Together, they provide the high-performance foundation that the Python layer calls into.

api Crate — Anthropic API Client

The api crate encapsulates all communication with the Anthropic API. Its AnthropicClient handles authentication, retries, and streaming:

Retry logic — Automatically retries on HTTP status codes 408, 409, 429, 500, 502, 503, and 504.
Authentication — The AuthSource enum supports four modes: None, ApiKey, BearerToken, and ApiKeyAndBearer.
Streaming — Server-Sent Events via MessageStream and SseParser for real-time token delivery.
OAuth — Full token exchange flow for enterprise authentication.

API Constants

DEFAULT_BASE_URL = "https://api.anthropic.com" • ANTHROPIC_VERSION = "2023-06-01" • DEFAULT_MAX_RETRIES = 2

runtime Crate — 16 Modules

The runtime crate is the largest in the Rust workspace with 16 modules covering everything from bash execution to OAuth flows:

bootstrap

Bootstrap (12 phases)

From CliEntry to MainRuntime through 12 sequential phases that progressively build the execution environment.

conversation

ConversationRuntime

Core turn loop with ApiClient + ToolExecutor traits. Max iterations capped at 16 per conversation round.

compact

CompactionConfig

preserve_recent = 4 messages, max_estimated_tokens = 10000. Keeps context window lean without losing critical state.

config

Configuration (3 sources)

ConfigSources: User, Project, and Local. Discovers settings.json files at each level with cascading priority.

file_ops

File Operations

Read, write, edit, glob, and grep with a 250 head_limit for search results and 100 glob truncation cap.

permissions

Permission System

PermissionMode: Allow, Deny, or Prompt. PermissionPolicy supports per-tool permission modes.

prompt

System Prompt Builder

MAX_INSTRUCTION_FILE_CHARS = 4000, MAX_TOTAL_INSTRUCTION_CHARS = 12000. Discovers and assembles CLAUDE.md files.

mcp

MCP Integration

Name normalization with mcp__{server}__{tool} convention. 6 MCP transport types: Stdio, SSE, HTTP, WebSocket, SDK, and ClaudeAiProxy.

Additional runtime modules include:

bash — Sandboxed shell command execution.
json — Zero-dependency JSON parser for minimal overhead.
oauth — Full PKCE authorization code flow, storing credentials at ~/.claude/credentials.json.
remote — Upstream proxy configuration with DEFAULT_REMOTE_BASE_URL and a list of 16 no-proxy hosts.
session — MessageRole enum (System, User, Assistant, Tool) and ContentBlock variants (Text, ToolUse, ToolResult).
sse — Incremental SSE parser for streaming API responses.
usage — ModelPricing with per-model rates: Sonnet at $15/$75 per million tokens, Haiku at $1/$5, Opus at $15/$75. Includes format_usd for display.

tools Crate — 19 Tool Specifications

The tools crate defines 19 tool specifications, each with a full JSON Schema for parameter validation. These schemas are what the LLM sees when deciding which tool to call:

Tool	Category	Description
`bash`	Execution	Run shell commands in a sandboxed environment
`read_file`	File I/O	Read file contents with offset/limit support
`write_file`	File I/O	Write or overwrite file contents
`edit_file`	File I/O	Targeted string replacements within files
`glob_search`	Search	Pattern-based file discovery
`grep_search`	Search	Regex content search powered by ripgrep
`WebFetch`	Network	HTTP requests to external URLs
`WebSearch`	Network	Web search queries
`TodoWrite`	Planning	Structured task list management
`Skill`	Extension	Invoke registered skill modules
`Agent`	Multi-agent	Spawn sub-agent for parallel work
`ToolSearch`	Discovery	Search for deferred tools by keyword
`NotebookEdit`	Notebook	Edit Jupyter notebook cells
`Sleep`	Utility	Pause execution for a specified duration
`SendUserMessage`	Communication	Send messages back to the user
`Config`	Settings	Read/write configuration values
`StructuredOutput`	Output	Return structured JSON responses
`REPL`	Execution	Interactive language REPL sessions
`PowerShell`	Execution	Windows PowerShell command execution

commands Crate — 15 Slash Commands

The commands crate implements the slash command system with 15 built-in commands. Each command is classified by a CommandSource enum: Builtin, InternalOnly, or FeatureGated.

// Available slash commands
/help        — Show help and available commands
/status      — Display session status and token usage
/compact     — Trigger manual transcript compaction
/model       — Switch the active model
/permissions — View or modify tool permissions
/clear       — Clear the conversation transcript
/cost        — Show accumulated session costs
/resume      — Resume a previous session
/config      — View or edit configuration
/memory      — Manage CLAUDE.md memory files
/init        — Initialize a new project workspace
/exit        — Exit the session
/diff        — Show git diff of session changes
/version     — Print version information
/export      — Export conversation transcript
      

rusty-claude-cli — CLI Binary

The rusty-claude-cli crate is the user-facing binary. It defaults to claude-sonnet-4-20250514 as the model and provides a rich terminal experience:

REPL — Interactive read-eval-print loop with a braille spinner animation (10 frames) for loading states.
Syntax highlighting — Powered by syntect with the base16-ocean.dark theme.
Markdown rendering — Uses pulldown_cmark to parse and render markdown directly in the terminal.
Raw mode line editor — Built on crossterm for cross-platform terminal input handling.

compat-harness Crate

The compat-harness crate serves as the compatibility bridge between the Python orchestration layer and the Rust performance layer. It ensures that data structures, function signatures, and calling conventions remain stable across the language boundary.

Data Flow: From Prompt to Response

Understanding the claw code architecture means tracing how a single user prompt flows through the entire system:

The user types a prompt in the rusty-claude-cli REPL.
The CLI passes input to the Python layer, where runtime.py tokenizes it and calls route_prompt to determine if it matches a slash command or should go to the LLM.
If it is a slash command, the commands crate handles it directly. Otherwise, the query engine takes over.
The query engine constructs the API request — including system prompt (built by the Rust prompt module from discovered CLAUDE.md files), conversation history (managed by transcript.py), and available tools (filtered by tool_pool.py, max 15).
The Rust api crate sends the request to Anthropic's API with streaming enabled, yielding SSE events back through the Python layer.
If the LLM response includes a tool call, the Rust runtime crate executes it (bash commands, file operations, etc.), checking permissions via the PermissionPolicy.
Tool results feed back into the next turn. This loop continues for up to 8 turns (Python query engine) or 16 iterations (Rust conversation runtime).
When the conversation accumulates more than 12 turns, the transcript is compacted — the Rust compaction module preserves the 4 most recent messages and limits estimated tokens to 10,000.

Session & Transcript Management

Claw-code maintains session state through two complementary mechanisms:

session_store.py persists full session data as JSON in the .port_sessions/ directory, enabling the /resume command to restore previous conversations.
transcript.py keeps an in-memory rolling transcript. Its compaction algorithm retains the last 10 messages (keep_last=10), discarding older content to prevent context window overflow.

On the Rust side, the session module defines the message schema with MessageRole (System, User, Assistant, Tool) and ContentBlock (Text, ToolUse, ToolResult). This shared schema ensures both layers interpret conversation history identically.

Python-TypeScript Parity Audit

Since claw-code is a clean-room rewrite of the Claude Code architecture, the parity_audit.py module continuously tracks implementation completeness. It maintains 18 root file mappings and 31 directory mappings that map Python modules to their TypeScript counterparts, making it straightforward to identify gaps and prioritize porting work.