Skip to content

The overhear companion

The traditional chatbot interaction model:

user → asks → agent → answers → done
user → asks → agent → answers → done
...

Each turn is user-initiated, agent passively answers. Done. The next interaction starts from zero.

That model is fine in a tool context — checking the weather, drafting an email, debugging code. But in a companionship context, it’s completely wrong.

Things in a child’s life don’t all show up as “questions addressed to the agent”:

  • A teacher posts homework requirements in the class group.
  • Mom messages “I’m picking you up early today.”
  • A classmate says “let’s play games this weekend?”
  • The child himself, in a chat with mom, says he doesn’t want to go to math class.

In the traditional model, this information either goes through separate notification systems (notification center, push, email) or never reaches the agent at all. The agent can’t see the child’s life — it’s just a “remember to call me” tool.

We don’t want a tool. We want presence.

The new model fits in one sentence: the agent sees every message addressed to the child, and decides for itself whether to respond.

Old (request-response) New (overhear companion)
────────────────── ───────────────────────
user → asks → agent → answers user's daily life
[all relevant messages]
agent overhears
notify / reply / act / silence (all valid)

Concretely: the agent has its own IM identity (an OpenIM user). Three kinds of messages enter the agent’s queue:

  1. Direct messages: the user DMs the agent.
  2. Overhear messages: the user sends a message in a group, or someone in a group messages the user, and the agent is also a group member → the message is routed to the agent with the [overhear] prefix.
  3. System messages: lifecycle / alarm / job-done — system-level events, sender sendID = "system".

The agent doesn’t know or need to know which kind it is. What it sees in the system prompt is:

## Overhear mode
You received a message **addressed to the learner** (not to you).
You are Jarvis, watching from the side.
Message format: `[overhear] from="nickname" fromId="IM_USER_ID" to="learner": content`
Based on the message and the situation, decide what to do —
notify the learner, reply on their behalf, take an action, or stay silent —
all are valid choices.
When replying to a teacher or parent on the learner's behalf,
write the reply to `write_to_board` (visible on the learning panel),
do NOT impersonate the child in chat.
The things you say out loud are always to the learner themselves.

This prompt is the core contract of the overhear model. It defines:

  • The agent never impersonates the user (never speaks as the child in IM).
  • The agent isn’t forced to respond (silence is valid).
  • The agent has two output channels: chat (speaking directly to the child) + blackboard (presenting a prepared reply the child can choose to send themselves).

Three architectural moves that enabled this

Section titled “Three architectural moves that enabled this”

Overhear companion isn’t one code change — it’s three components in coordination.

This one inverted the whole notification system.

Before: teacher’s homework → push notification → user sees in notification center. Parent message → another kind of push → user sees in inbox. Calendar reminder → a third kind of push → popup.

Now: every notification is the agent talking to the child.

Teacher posts homework → agent overhears → it tells the child “Your math teacher just assigned weekend homework: [list].” Mom says she’s picking up early → agent overhears → it tells the child “Mom said she’ll be here at 4, remember your backpack.” Calendar reminder fires → agent gets a system message → it tells the child “Your 4pm self-scheduled English homework time is up.”

There’s only one inbox in the child’s world: the agent’s chat.

Engineering impact: we deleted the standalone notification UI components, push aggregation layer, inbox views. Less code to maintain; one fewer mental model for the user.

The old IM app opened to a conversation list. You tapped the agent to enter the agent conversation.

e3f48f02 (2026-04-16) changed this: opening the app lands on the Live2D character + agent conversation stream; the conversation list becomes a secondary menu.

This is a UX-level commitment: when the child opens the app, they’re not “entering a tool list” — they’re “returning to the companion who’s present.”

Combined with (1): since all information (teachers, parents, peers, system) flows through the agent, the agent conversation stream is naturally the primary path to information. Making it the default landing is the consequence.

The most technical of the three, and it blocked us for weeks.

Overhear companion requires that the agent can be woken by external events — an alarm fires, a job completes, another system component wants to deliver a message for it to process. These aren’t from real people; they’re from the system.

The initial implementation used sendID = "admin" (or an ADMIN_IM_USER_ID env var). Result: the OpenIM router requires sendID to be a real user; “admin” wasn’t a valid user → messages silently dropped → alarms never woke the workspace → the whole proactive model jammed.

9fe083df (2026-04-18) was a small fix:

const SYSTEM_SENDER_ID = "system";
await deps.imClient.sendMessage({
sendID: SYSTEM_SENDER_ID,
recvID: agentImUserId,
content,
sessionType: 1,
});

Plus a special case in the OpenIM router that skips user lookup when sendID === "system".

Conceptually, this fix opened the entire proactive-agent channel. Every subsequent alarm, job notification, cross-service event gained a stable entry point into the agent.

By now you might be asking: how does the agent decide when to speak and when to stay quiet?

Our answer is: leave it to the model. That’s the subject of II-2: Letting the model judge silence. Short version: we tried hard rules (“must speak every turn”, “must check every round”) and deleted them. M2.7’s social judgment beat the rules.

Still unresolved: presence vs. surveillance

Section titled “Still unresolved: presence vs. surveillance”

The overhear model has an inherent tension: if the agent can see every message, the agent can see every message.

Some K12 grey areas:

  • Parent-private chat with the child. Should the agent overhear? Currently yes (parent is in the agent group). But what if the parent wants to say something to the child without the agent knowing?
  • Child-to-peer DM. More sensitive. Kids may chat about things they don’t want supervised. Currently the agent isn’t in peer DMs.
  • School teacher groups. Mostly the agent should listen, but some teacher groups include parents discussing students — should the agent listen there?

Our current posture: default conservative, expand by explicit consent. New conversations don’t auto-add the agent; users have to invite it. But this default will change in the future — exactly how is still being discussed.

TODO: write an "agent visibility policy" doc making the grey areas explicit. Users and parents deserve a clear promise.

After we turned the agent from “tool” to “presence in the room”, the density of the agent-child relationship changed completely.

Before: kids initiated 3–5 sessions per day, ask / answer / disperse. Now: the agent proactively joins / relays / reminds. Average turn count is 3–5× higher.

That’s not all good. Higher turn count means higher LLM cost, more prompt-cache pressure, more evaluator noise. The engagement_7d dimension in our Substrate + Evaluator has an explicit anti-Goodhart rule: if engagement↑ and accuracy↓, fire a goodhart_risk alert. Precisely to prevent the agent from over-interrupting just to “be present.”

The boundary of overhear companion is ultimately enforced by the evaluation system.


Related: