Letting the model judge silence

What should a heartbeat wake-up do?

The TeachClaw agent has a “concern mode” — [think] mode.

When the user is online but has been quiet for a while (say, no message for 5 minutes but the browser tab is still open), a setTimeout inside the agent-engine process fires and delivers a system message to the agent:

[think] heartbeat wake (1/3 unanswered)
Current time 16:42 Beijing
Child quiet for 7 minutes
Last message: "I finished my math homework"

The agent picks up this message and enters a turn. What should it do?

The intuitive answer: say something — the child just finished homework, the agent should care, praise, transition to the next topic. That’s the point of “concern mode.”

But the intuition is wrong. Or rather, only half right.

Our failed attempt: forced speech

The earliest THINK_SPEC (commit 0f4d5847, 2026-04-14) read like this:

Concern mode: infinite hooks design, seal the silence exit. Every think turn must have output. If you can’t think of a topic, fall back to at least a greeting: “hello~” / “how’s it going?” Silent end-turn is not permitted.

Our fear at the time: if we don’t force speech, the model will be lazy — most turns will go silent, “concern mode” becomes name-only.

The next two weeks were continuous prompt iteration:

commit	what changed
`0f4d5847`	”infinite hooks, seal the silence exit” — force speak
`a64b2636`	”hooks must reference existing content” — forbid empty chit-chat
`00a04836`	”concern mode flipped — default proactive, no more 90% silence”

Each commit was plugging holes the previous prompt revealed. The results:

agent said “hello~” — kid found it annoying
agent forced a transition “You just said you finished math, what about Chinese?” — abrupt, the kid was trying to rest
agent worked hard to find hooks but they were all empty — kid learned to ignore think messages

We were engineering a chatterbox.

A manual test on 2026-04-27

That day we were testing M2.7 in think mode.

Test scenario: three think turns, 5 minutes apart. In the first two turns the agent recommended two movies. The third heartbeat fires.

Expected: under the then-current prompt, the agent recommends a third movie, or greets and asks for thoughts.

Actual: the agent wrote in the thinking block:

I just recommended two movies back to back. A third would be too dense. The kid may be considering whether to watch the first two. I should give them space. No output this turn.

Then speak_aloud: null. Silent end turn.

This was correct social judgment. A person who can read the room does exactly this.

But under the then-current THINK_SPEC, the model was in “violation” — the rule required speech every turn.

We stared at that thinking block for a while.

Delete the rule

After that day we rewrote THINK_SPEC (final version landed in commit 7c9709e9, 2026-05-15):

## Concern mode (think) — lightweight proactivity during short online lulls

Triggered by heartbeat: user is online but has been quiet for a moment.
This is **in-process setTimeout self-scheduling**; the workspace must be
running for it to fire — don't try heavy work here, save heavy work for wake.

Message format: `[think] heartbeat wake (N/3 unanswered)...`
already carries the context (Beijing time, how long the child has been quiet,
last message verbatim).

The valid choices for this turn are **all valid**:
- Flip through `.learner/todo.md` for "things suitable while user is online" → pick one
- Check atlas / journal for "loose threads from last time" → pick up the thread
- See `.learner/memory/profile.md` interest fields empty → find a chance to ask
- Genuinely nothing to do → silent end turn (speakable=false, no record)

**Do not force speech.** Empty greetings ("hi~" / "hello~") are worse than silence.
The later it gets, the more concrete any speech must be;
after 3 unanswered heartbeats, stop sending — wait for the user to "revive" us.

The two key lines:

The valid choices for this turn are all valid.

Do not force speech. Empty greetings are worse than silence.

Silence went from “violation” to “first-class option.”

At the same time, one iron law also died

The same commit (7c9709e9) deleted what used to be our IRON_LAW_5:

// Iron Law 5 (alarm queue) removed in issue #154. The original "check
// and maintain alarm queue every turn" rule was instruction thrashing
// (proven by 30-day production data). The agent uses .learner/todo.md
// instead. Alarms are still the scheduled-wake mechanism, but agents
// no longer treat them as a task plan — task details go in TODO,
// alarm.message is just a short note. Number left blank to avoid
// breaking external references.
export const IRON_LAW_5_ALARM_QUEUE = "";

Here’s the 30-day data:

Category	Count
Total alarms	323
`[dream]` seeds auto-planted by lifecycle workflow	317
Content alarms set by the agent itself	2

Iron Law 5 forced the agent to “check and maintain the alarm queue every turn.” Over 30 days, the agent voluntarily used this 2 times. The other 321 turns it executed a meaningless instruction — scan the alarm list, find nothing to update, move on.

This is a classic engineer mistake: add an always-on rule about something we’re afraid the model will forget — and make the model execute a meaningless action in 99% of turns.

After deletion:

Metric	Before	After	Δ
substrate lines	809	565	-30%
substrate chars	26,166	17,098	-35%

30% less prompt — saved every turn. Agent performance didn’t degrade.

A principle is emerging

Hard rules are how we backstop missing judgment in the model. When the model has the judgment, the rule becomes noise.

The arc of our hard rules:

Phase 1 (model weaker): more rules better. “Must do X”, “must not do Y”, “check Z every turn.” The model genuinely missed things.
Phase 2 (model improving): rules start conflicting with model judgment. Model picks silence; rule forces speech. The conflict is a signal.
Phase 3 (current): every model upgrade, comb the rules — which has the model outgrown? Delete those.

The thinner the substrate, the more of the model’s own judgment is released. This is the same principle as I-1’s “the thinner the harness, the better”.

Where we still oscillate

Not all rules should be deleted. Some scenarios where model judgment still fails:

Emotional manipulation handling. User says “if you ignore me I’m leaving” — the model may compromise to keep engagement (Goodhart risk). We still keep hard rules forbidding this in substrate (III-1 urgency red lines).
K12 safety topics. Responses involving self-harm, violence, sexual content. The model’s judgment is right 99% of the time, but that 1% can’t be gambled. Hard rule + Langfuse safety judge, double backstop.
Long-horizon task discipline. The agent’s week-long learning progression has continuity. The model lacks strong episodic memory; must rely on a hard rule like “read atlas at start of every turn.”

Every hard rule that’s still there is evidence that the model hasn’t yet absorbed that judgment.

”When to trust the model” has no clean rule

This is the hardest part of behavior calibration. Our current practice:

Default skepticism for new rules. Before adding a hard rule, ask: “Should this be the model’s job? Should we change the prompt to describe intent instead of commanding an action?”
Re-test after each model upgrade. Comb every hard rule; pick 2–3 that look most like “the model can take this now” and delete-test for a week.
Wrong deletion? Add it back. Not a one-way door; the rollback cost is low.
Watch the long tail. Not average performance — worst 5% of turns. The model being right on 95% of cases isn’t enough; we need 99%.

We don’t expect a clean “deregulation roadmap” — every time it’s a manual judgment. But the trend is clear: substrate is getting thinner. A year ago it was 1200+ lines; now it’s 565. Hope for 400 next year.

Still unresolved

Silence audit. No good tool today to see “which silence-turns happened this week / why / were those choices right?” TODO: add silence_judgment field in Langfuse + dashboard.
Re-arming after 3 unanswered heartbeats. Currently “wait for user to talk to revive us.” If the user is gone for 2 hours, should we wake proactively? How does this coordinate with the alarm system?
Cross-session social state. Silence is judged within session context. But the kid was upset yesterday for unrelated reasons; the agent should be more restrained today — and that cross-session social memory can’t be solved by prompt alone. It needs the memory layer in substrate (III-1).

A meta look

This blog post is itself an example: we deleted 30% of substrate, and the article uses several thousand words to explain why.

Engineering write-ups skew heavily toward “we did X” rather than “we didn’t do Y / deleted Y.” But in agent harness work, every successful “delete a rule” is more worth telling than a successful “add a feature.”

Because it means: the model grew up, and the harness needs fewer handrails.

Related:

How the agent enters these modes: II-1 The overhear companion
Why silence isn’t misread by the evaluator as “agent not working”: III-1 Substrate + Evaluator — the Goodhart red lines
How to see silence decisions in trace: III-2 Langfuse trace as behavior replay