Skip to content

Why Claude Agent SDK is our agent runtime

If you lay out all the engineering tasks that go into “let an LLM run a tool loop,” you get a boring but long list:

  • Send messages to the model, handle streaming responses
  • Parse tool calls, invoke matching tools, push results back into conversation history
  • Handle tool-call hooks (pre / post), errors, timeouts, cancellation
  • Maintain turn state, compress the context window, manage multi-turn dialog
  • Integrate with MCP servers
  • Surface model / tool / usage data to the observability layer
  • Abstract over different model providers

That’s what “agent runtime” does. Every team building an agent product has to solve this set of problems. The real question is whether you should solve them yourself.

The first runtime we shipped was called OpenClaw. A hand-written gateway — Node inside the container, HTTP for messages, calling Anthropic’s API, handling the tool-call loop, streaming, and retries ourselves.

It worked. But there was a maintenance tax every week:

  • Anthropic changed the streaming protocol (partial messages → typed events). One week of catch-up.
  • Tool-call input schema migrated from JSON Schema to a Pydantic-like shape. Another week.
  • Adding vision input, thinking blocks, prompt cache markers — each one a chase.

The deeper problem: we were maintaining something that duplicated Anthropic’s internal SDK. Anthropic obviously has a full-fidelity client library internally; them publishing it was just a matter of time.

That day’s commit 086a7e91:

refactor(infra): remove OpenClaw, use Claude Agent SDK directly

Net delete: 359 lines of OpenClaw gateway code. Replaced with @anthropic-ai/claude-agent-sdk, imported directly into the agent-engine process.

The argument was simple: by then the Claude Agent SDK had stabilized — streaming, tool hooks (PreToolUse / PostToolUse), MCP integration, usage stats are all officially maintained. When any of those protocols change, we upgrade the SDK. It’s no longer our bug.

After the switch, agent-engine is no longer “the runtime.” It’s “a thin wrapper on top of the runtime.” Around 430 lines of TypeScript total; the entry point is a polling loop:

// packages/agent-engine/src/index.ts (simplified)
async function processMessages() {
const messages = getNewMessages(1); // FIFO from SQLite
if (messages.length === 0) return;
const msg = messages[0];
const composed = await composeTurn({
userMessage: msg.content,
userImId: recipientId || undefined,
runtime: { mode, triggerReason, lastUserMessageAgeMs, lastUserSnippet },
});
const agentResult = await Promise.race([
runAgent(chatId, promptText, {
cwd: CWD,
mode,
speakable,
channelContext: { chatId, recipientId, groupId, sessionType, turnId, traceId },
systemPromptAppend: composed.staticSystemPrompt,
onToolEvent: (event) => {
broadcastToolEvent(event);
},
}),
new Promise((_, reject) =>
setTimeout(
() => reject(new Error(`Agent timeout after ${AGENT_TIMEOUT_MS / 1000}s`)),
AGENT_TIMEOUT_MS,
),
),
]);
}

runAgent calls into the SDK:

const q = query({
prompt: promptText,
options: {
cwd,
resume: sessionId,
systemPrompt: {
type: "preset",
preset: "claude_code",
excludeDynamicSections: true, // see below
append: systemPromptAppend,
},
permissionMode: "bypassPermissions",
tools: [...BUILTIN_TOOLS],
model: process.env.AGENT_MODEL || undefined,
mcpServers,
includePartialMessages: true,
hooks: { /* PreToolUse, PostToolUse */ },
},
});

What agent-engine adds on top of the SDK falls into four buckets:

1. Input side: polling + context assembly

  • A 2-second SQLite polling loop, FIFO message pickup.
  • composeTurn() assembles the substrate (static prefix) + runtime context (dynamic) + user message.
  • excludeDynamicSections: true is critical: the SDK’s preset: "claude_code" automatically injects current working dir / auto-memory / git status — these change every turn and break the cache prefix before our substrate append. Setting it to true keeps the preset static so prefix caching can hit.

2. Tool side: MCP server

  • 6 tools exposed to the SDK through a hand-rolled MCP server: write_to_board, openApp, showViz, localBash, submit_job, compact_session.
  • The SDK ships its own Read/Write/Edit/Bash/Glob/Grep — we use those directly.
  • PreToolUse emits a tool_use event before each call; PostToolUse emits the result.

3. Output side: streaming + lip-sync + learning panel

  • text_delta events are sliced by sentence → fed to a TTS pipeline that produces voice segments.
  • write_to_board tool calls render to the learner’s right-side blackboard (the learning panel), synced with the voice stream.
  • Each turn writes an agent_turn_events row in Postgres for admin replay.

4. Observability: hand-rolled Langfuse instrumentation

  • We don’t use @arizeai/openinference-instrumentation-claude-agent-sdk — OTel v1/v2 incompatibility (detailed in III-2).
  • Three observation types: agent-turn (span) / claude-agent-llm (generation) / tool/<name> (tool).
  • PreToolUse → startToolObservation, PostToolUse → end().

That’s all of it. No LangChain, no LangGraph, no agent framework. The whole runtime fits in one head.

The good:

  • Streaming protocol, tool schema, provider abstraction — all maintained by Anthropic. They ship, we upgrade. Zero engineering on our side.
  • In-process import. No HTTP / IPC / gRPC layer.
  • A new engineer can read the agent-engine source in a day.

The bad:

  • We’re bound to the Anthropic API shape. Our primary model is MiniMax-M2.7 (more in III-2), served through MiniMax’s https://api.minimaxi.com/anthropic compatibility shim. That shim is a single point of risk — if MiniMax changes the protocol, or Anthropic ships a breaking SDK update, we break.
  • The SDK is closed source. We can’t fork it. Bugs are either worked around or waited on.
  • Performance ceiling moves with the SDK. They haven’t pushed streaming below 50ms yet, so neither have we.

Still unresolved:

  • Truly long tool calls. SDK tool execution is cooperative (PreToolUse → run → PostToolUse). A >1-minute call blocks the turn. We bolted bg-worker onto the side, but the SDK protocol itself should support fire-and-forget tools.
  • Multi-agent. The SDK only knows about a single agent. When we want to spawn a sub-agent inside a turn to run a skill (see I-3), we shell out to a claude -p CLI process instead of spawning inside the SDK. Two interfaces, two abstractions.
  • Local-model fallback. ANTHROPIC_BASE_URL is a single choice — MiniMax or Anthropic. No runtime fallback. Resilience is entirely a deploy-time decision.

Every time the SDK ships an update, we ask the same question: can we delete more code from agent-engine?

If the answer is no, the SDK isn’t growing. If the answer is yes, the SDK has absorbed something we previously had to backstop — and that’s what we want.

The ideal end state: agent-engine collapses to 50 lines of glue. The whole “agent runtime” concept disappears behind the SDK. The thinner the harness, the better.

We’re far from that day. But every line of OpenClaw-era residue we delete is a step in that direction.


Related: