Outside vs. inside the container

“Where the agent runs” isn’t a simple question

You built an agent. Where does it run? The most intuitive answer: in your backend.

But in the K12 setting, we’re shipping “one agent per child” — each agent has its own filesystem, its own browser, its own code projects, its own installable tools. Isolation has to be at the container level. Not process-level. Not namespace-level.

So the agent runs in a container. Next question: who manages the container’s lifecycle?

Creating: when a user first logs in, we auto-provision a workspace.
Sleeping: at midnight when nobody’s chatting, stop the container. Save cost.
Waking: in the morning when the user sends a message, the container must come back before they notice.
Recovering: OOM, image update, node migration — must be resilient.

This is classic stateful workload orchestration. Our answer is to peel it completely off the agent itself.

The three-layer split

┌─────────────────────────────────────────────────────────┐
│ Temporal (outside the container, platform layer)        │
│ workspaceLifecycleWorkflow — state machine              │
│   running → hibernating ⇄ waking → running              │
└─────────────────────────────────────────────────────────┘
                      ↓ (calls Coder API)
┌─────────────────────────────────────────────────────────┐
│ Coder (container provider)                              │
│ POST /api/v2/workspaces                                  │
│ POST /api/v2/workspaces/{id}/builds (start/stop)         │
└─────────────────────────────────────────────────────────┘
                      ↓ (provides container)
┌─────────────────────────────────────────────────────────┐
│ Inside the container: Ubuntu + agent-engine + SDK       │
│ Process IM messages, call tools, write code, teach      │
└─────────────────────────────────────────────────────────┘

Each layer does one thing:

Coder: provides the container. Doesn’t know whether what’s inside is an agent or a web server.
Temporal: manages the container lifecycle. Durable workflow + signal + activity — doesn’t run AI, doesn’t call LLMs.
Claude Agent SDK + agent-engine: runs the agent behavior inside the container. Doesn’t manage its own container.

The boundaries are hard: Temporal doesn’t know what the agent is doing; the agent doesn’t know it’s running inside Coder. That’s deliberate.

The lifecycle state machine

The Temporal layer is one workflow. Simplified state machine:

export async function workspaceLifecycleWorkflow(input) {
  while (true) {
    // === Running ===
    state = "running";
    const timedOut = !(await condition(
      () => activityDetected || agentWentOffline,
      input.hibernateTimeoutMs,      // default 30 min
    ));

    // === Entering hibernation ===
    // Before sleeping, if agent didn't set an alarm, seed a [dream] alarm +1h
    if (!agentWentOffline) {
      const hasFuture = await act.hasFutureWakeAlarm({ agentId });
      if (!hasFuture) {
        await act.scheduleDreamAlarm({ agentId, delayMs: DREAM_DELAY_MS });
      }
    }
    await act.stopCoderWorkspace({ workspaceId });

    // === Hibernating + Waking loop ===
    while (true) {
      state = "hibernating";
      await condition(() => wakeUpPayload !== null);  // Wait for signal

      state = "waking";
      try {
        await act.startCoderWorkspace({ workspaceId });
        await longAct.waitCoderWorkspaceReady(workspaceId);
        break;  // → back to running
      } catch {
        // Wake failed; stay in hibernating, wait for next signal
      }
    }
  }
}

Three states: running / hibernating / waking. All transitions are explicit — hibernate fires on timeout, wake fires on signal.

wakeUpSignal is sent by the TeachClaw API when it receives an IM message and discovers the workspace is offline.

A full wake flow

User has been inactive for 6 hours; workspace is stopped. Morning comes. They send a message:

[User] sends IM message
   ↓
[OpenIM webhook → TeachClaw API]
   ↓ checks agent.workspace_status, sees stopped
[TeachClaw API → Temporal]
   wakeUpSignal(workflowId, payload)
   ↓
[Temporal workflow]
   wakeUpPayload = payload  // unblocks condition()
   state = "waking"
   ↓
[Activity: startCoderWorkspace]
   POST /api/v2/workspaces/{id}/builds { transition: "start" }
   ↓
[Activity: waitCoderWorkspaceReady]
   Poll status every 2s until ready (10 min timeout)
   ↓
[Container boots → agent-engine starts → SQLite polling]
   getNewMessages() picks up the user's message
   ↓
[Claude Agent SDK]
   query() processes the message, starts streaming
   ↓
[User sees agent typing / hears voice]

What the user perceives is “the agent’s a beat late.” Coder API, Temporal Signal, activity retry — they don’t know any of it.

Typical wake latency (dev observations):

Stage	Time
IM → TeachClaw API → Temporal signal	< 100ms
Coder start build	5–15s
Container boot + agent-engine startup	3–8s
agent-engine polling picks up message	< 2s
Claude SDK first streaming token	1–3s

Why not mixing pays off

By this point the “doesn’t do” list for each layer matters more than the “does” list.

Temporal doesn’t:

Run agent behavior. Workflows don’t call LLMs or tools.
Know who the agent is talking to.
Store agent state. (State lives inside the container — SQLite + filesystem.)

Coder doesn’t:

Know what’s running inside the container.
Decide when a container should sleep or wake (Temporal decides).
Expose the agent’s tools or messages.

agent-engine + Claude Agent SDK don’t:

Call the Coder API.
Know when their container is about to be stopped (they get SIGTERM and do a graceful shutdown).
Maintain “when will I be woken next?” — alarms are stored in the TeachClaw API; the alarm-scheduler signals Temporal.

Every “doesn’t” buys clarity. When the container layer breaks (node drift, image update), you don’t touch agent code. When agent behavior breaks (wrong words, missing tools), you don’t touch Temporal.

Counter-examples: when we mixed

This split wasn’t there from day one. Earlier mixings cost us:

agent-engine calling Coder API to check “am I alive?” Problem: the agent container’s token got tangled with external TeachClaw permissions. The call chain became unauditable. Fix: the agent doesn’t know Coder exists.
A 90-second fixed sleep inside the Temporal workflow waiting for “alarm timing.” Problem: the sleep raced with the alarm. Fix: delete the sleep, let alarms come from outside via signal (commit aacb9186: refactor(workflows): Dream goes through alarm channel, delete 90s sleep race).

Every mix is a “what if I just call one more API” temptation — but cross-layer calls turn incident response into archaeology.

Why Temporal

Someone always asks: K3s ships its own controllers. Why not just use K8s primitives?

Temporal’s load-bearing value here is durable execution:

A workflow process dies mid-run; on restart, it continues from the same point.
Signals don’t get lost; they’re always delivered.
Activity retry is declarative (attempts, backoff, timeout).

A K8s controller could do this — but a stateful controller that knows how to hibernate / wake involves 100+ lines of Go plus your own state machine. The Temporal state machine above is the 50 lines of TypeScript shown earlier.

The cost is:

Another component to run (Temporal server, 3-node HA).
Engineers must learn the workflow programming model (determinism, patched(), versioning).
Workflow code changes require patched() or you wedge old executions — an iron law that’s easy to forget.

We think the cost is worth it. For high-cardinality stateful workloads like “one agent per workspace,” Temporal is the industry-standard answer.

Still unsolved

Cold-start latency. 5–15s of Coder build is grudgingly acceptable for K12 chat, but we want it under 3s. Possible directions: pre-warmed pools, image layer optimization, KubeVirt replacing Coder. Under evaluation.
Cross-node affinity. When a workspace wakes on a new node, its SQLite is on a PVC — but IO performance drops. No production pain yet.
Burst wake. 8AM, hundreds of workspaces waking at once, Coder build queue saturates. TODO: measure burst wake p99; consider warm pool.

A real payoff

This split has one unexpected benefit: when we shipped Substrate + Evaluator (a major agent-behavior layer change), we never touched Temporal or Coder. Prompt changes, mode spec edits, evaluator dimensions — all of it lives inside the agent-engine package. No workflow redeploy.

And when we did the Coder → Talos infrastructure migration, agent behavior code didn’t change a line.

That’s the payoff of boundaries: each layer evolves independently.

Related:

The runtime choice itself: I-1 Why Claude Agent SDK is our agent runtime
How the in-container agent handles >1min tasks: I-3 bg-worker: offloading heavy I/O
How the agent schedules its own wakeups: Series II — Behavior Calibration