Why Claude Agent SDK is our agent runtime
What an agent runtime is supposed to do
Section titled “What an agent runtime is supposed to do”If you lay out all the engineering tasks that go into “let an LLM run a tool loop,” you get a boring but long list:
- Send messages to the model, handle streaming responses
- Parse tool calls, invoke matching tools, push results back into conversation history
- Handle tool-call hooks (pre / post), errors, timeouts, cancellation
- Maintain turn state, compress the context window, manage multi-turn dialog
- Integrate with MCP servers
- Surface model / tool / usage data to the observability layer
- Abstract over different model providers
That’s what “agent runtime” does. Every team building an agent product has to solve this set of problems. The real question is whether you should solve them yourself.
We used to: OpenClaw
Section titled “We used to: OpenClaw”The first runtime we shipped was called OpenClaw. A hand-written gateway — Node inside the container, HTTP for messages, calling Anthropic’s API, handling the tool-call loop, streaming, and retries ourselves.
It worked. But there was a maintenance tax every week:
- Anthropic changed the streaming protocol (partial messages → typed events). One week of catch-up.
- Tool-call input schema migrated from JSON Schema to a Pydantic-like shape. Another week.
- Adding vision input, thinking blocks, prompt cache markers — each one a chase.
The deeper problem: we were maintaining something that duplicated Anthropic’s internal SDK. Anthropic obviously has a full-fidelity client library internally; them publishing it was just a matter of time.
2026-03-31: we deleted it
Section titled “2026-03-31: we deleted it”That day’s commit 086a7e91:
refactor(infra): remove OpenClaw, use Claude Agent SDK directly
Net delete: 359 lines of OpenClaw gateway code. Replaced with @anthropic-ai/claude-agent-sdk, imported directly into the agent-engine process.
The argument was simple: by then the Claude Agent SDK had stabilized — streaming, tool hooks (PreToolUse / PostToolUse), MCP integration, usage stats are all officially maintained. When any of those protocols change, we upgrade the SDK. It’s no longer our bug.
agent-engine: the thin wrapper on top
Section titled “agent-engine: the thin wrapper on top”After the switch, agent-engine is no longer “the runtime.” It’s “a thin wrapper on top of the runtime.” Around 430 lines of TypeScript total; the entry point is a polling loop:
// packages/agent-engine/src/index.ts (simplified)async function processMessages() { const messages = getNewMessages(1); // FIFO from SQLite if (messages.length === 0) return;
const msg = messages[0]; const composed = await composeTurn({ userMessage: msg.content, userImId: recipientId || undefined, runtime: { mode, triggerReason, lastUserMessageAgeMs, lastUserSnippet }, });
const agentResult = await Promise.race([ runAgent(chatId, promptText, { cwd: CWD, mode, speakable, channelContext: { chatId, recipientId, groupId, sessionType, turnId, traceId }, systemPromptAppend: composed.staticSystemPrompt, onToolEvent: (event) => { broadcastToolEvent(event); }, }), new Promise((_, reject) => setTimeout( () => reject(new Error(`Agent timeout after ${AGENT_TIMEOUT_MS / 1000}s`)), AGENT_TIMEOUT_MS, ), ), ]);}runAgent calls into the SDK:
const q = query({ prompt: promptText, options: { cwd, resume: sessionId, systemPrompt: { type: "preset", preset: "claude_code", excludeDynamicSections: true, // see below append: systemPromptAppend, }, permissionMode: "bypassPermissions", tools: [...BUILTIN_TOOLS], model: process.env.AGENT_MODEL || undefined, mcpServers, includePartialMessages: true, hooks: { /* PreToolUse, PostToolUse */ }, },});What agent-engine adds on top of the SDK falls into four buckets:
1. Input side: polling + context assembly
- A 2-second SQLite polling loop, FIFO message pickup.
composeTurn()assembles the substrate (static prefix) + runtime context (dynamic) + user message.excludeDynamicSections: trueis critical: the SDK’spreset: "claude_code"automatically injects current working dir / auto-memory /git status— these change every turn and break the cache prefix before our substrate append. Setting it totruekeeps the preset static so prefix caching can hit.
2. Tool side: MCP server
- 6 tools exposed to the SDK through a hand-rolled MCP server:
write_to_board,openApp,showViz,localBash,submit_job,compact_session. - The SDK ships its own Read/Write/Edit/Bash/Glob/Grep — we use those directly.
- PreToolUse emits a
tool_useevent before each call; PostToolUse emits the result.
3. Output side: streaming + lip-sync + learning panel
text_deltaevents are sliced by sentence → fed to a TTS pipeline that produces voice segments.write_to_boardtool calls render to the learner’s right-side blackboard (the learning panel), synced with the voice stream.- Each turn writes an
agent_turn_eventsrow in Postgres for admin replay.
4. Observability: hand-rolled Langfuse instrumentation
- We don’t use
@arizeai/openinference-instrumentation-claude-agent-sdk— OTel v1/v2 incompatibility (detailed in III-2). - Three observation types:
agent-turn(span) /claude-agent-llm(generation) /tool/<name>(tool). - PreToolUse →
startToolObservation, PostToolUse →end().
That’s all of it. No LangChain, no LangGraph, no agent framework. The whole runtime fits in one head.
Current trade-offs
Section titled “Current trade-offs”The good:
- Streaming protocol, tool schema, provider abstraction — all maintained by Anthropic. They ship, we upgrade. Zero engineering on our side.
- In-process import. No HTTP / IPC / gRPC layer.
- A new engineer can read the agent-engine source in a day.
The bad:
- We’re bound to the Anthropic API shape. Our primary model is MiniMax-M2.7 (more in III-2), served through MiniMax’s
https://api.minimaxi.com/anthropiccompatibility shim. That shim is a single point of risk — if MiniMax changes the protocol, or Anthropic ships a breaking SDK update, we break. - The SDK is closed source. We can’t fork it. Bugs are either worked around or waited on.
- Performance ceiling moves with the SDK. They haven’t pushed streaming below 50ms yet, so neither have we.
Still unresolved:
- Truly long tool calls. SDK tool execution is cooperative (PreToolUse → run → PostToolUse). A >1-minute call blocks the turn. We bolted bg-worker onto the side, but the SDK protocol itself should support fire-and-forget tools.
- Multi-agent. The SDK only knows about a single agent. When we want to spawn a sub-agent inside a turn to run a skill (see I-3), we shell out to a
claude -pCLI process instead of spawning inside the SDK. Two interfaces, two abstractions. - Local-model fallback.
ANTHROPIC_BASE_URLis a single choice — MiniMax or Anthropic. No runtime fallback. Resilience is entirely a deploy-time decision.
The question we ask each generation
Section titled “The question we ask each generation”Every time the SDK ships an update, we ask the same question: can we delete more code from agent-engine?
If the answer is no, the SDK isn’t growing. If the answer is yes, the SDK has absorbed something we previously had to backstop — and that’s what we want.
The ideal end state: agent-engine collapses to 50 lines of glue. The whole “agent runtime” concept disappears behind the SDK. The thinner the harness, the better.
We’re far from that day. But every line of OpenClaw-era residue we delete is a step in that direction.
Related:
- Where does the runtime run? → I-2: Outside vs. inside the container
- Tool calls > 1 minute? → I-3: bg-worker
- Why doesn’t prompt cache break? → III-2: Langfuse trace