Y U S E F @ M O S I A H . O R G

12th May 2026 at 9:15am

Related: sources · notes · metadata · Published Pieces

Realtime Models Solve Presence. Artifact Graphs Solve Memory.

A conversation is a route through state. It is not the state.

Realtime interaction models solve a real problem: presence.

A turn-based model does not experience the conversation the way humans do. It waits until the user finishes speaking. Then it replies. During its reply, its perception often freezes. The interaction becomes a sequence of sealed turns. But real conversation is not sealed turns. It is interruption, silence, overlap, gesture, hesitation, backchannels, timing, facial expression, and self-correction.

Realtime models are trying to restore that missing layer. They can listen while speaking. They can notice when the user is thinking rather than yielding. They can react to visual cues. They can handle overlapping speech. They can speak at the right moment instead of waiting for a clean turn boundary.

That is valuable.

But presence is not memory.

A model that handles interruption beautifully is not therefore the right place to store the state of a long-running project. A model that notices a user’s posture is not therefore the right substrate for research, citation, publication, coding, provenance, or intellectual property. A model that feels alive in conversation is not necessarily the system that should own the canonical record.

The canonical record should live outside the realtime interaction layer.

For Choir, the canonical record is the artifact graph: vtexts, citations, sources, claims, transcripts, voice clips, revisions, code patches, background agent runs, track records, and public discourse dependencies. That graph is memory. It is inspectable. It is revisable. It persists across sessions. It can be searched, cited, forked, audited, published, and rewarded.

A realtime model can sit at the edge of that graph. It can help the user enter the system. It can understand speech, timing, and interruption. It can handle the social rhythm of interaction. But the artifact graph should remain sovereign.

This distinction prevents a common architectural error. When a system feels conversationally fluid, builders are tempted to treat the conversation as the product. The transcript becomes the memory. The voice session becomes the world. The model becomes the subject. Everything else is tool use.

That is backwards.

A conversation is a route through state. It is not the state.

A meeting is not the project. A phone call is not the codebase. A lecture is not the curriculum. A podcast is not the source archive. A conversation can generate artifacts, clarify decisions, and reveal intent, but the durable object must live elsewhere.

Voice AI should follow the same rule.

The realtime layer is useful when the user says: wait, go deeper, skip this, what is the source, return to the main thread, pause while I think, save that as a vtext, tell the coding agent to test the verifier. Those utterances should become events. They should update the route through the artifact graph. They should not disappear into a transient voice session.

The deeper system needs background agents. A research agent may search and parse sources. A coding agent may run in a VM. A verifier agent may inspect traces. A radio producer may sequence the next audio segment. A citation agent may retrieve prior relevant work. These agents require state, permissions, logs, rollback, provenance, and durable memory. None of that should be trapped inside a realtime voice model’s context window.

The distinction can be compressed:

Realtime models solve presence.

Multiagent systems solve search.

Artifact graphs solve memory.

A serious audio product needs all three, but they should not be collapsed into one thing.

The realtime model should handle the living edge of the interaction: speech, silence, timing, interruption, video, gesture, and attention. The background system should handle long-horizon cognition. The artifact graph should preserve what matters.

This makes the voice layer replaceable. Today it might be STT plus TTS. Tomorrow it might be a native full-duplex interaction model. Later it might be local, multimodal, always-on, and extremely good at timing. Fine. The front-end can improve.

But the memory should not move.

If the product is built correctly, a better interaction model can be swapped into the edge without changing the deeper architecture. The user still has vtexts. The platform still has citations. The radio still plays original human voices. The agents still work on artifacts. The public graph still tracks provenance and future relevance.

The danger of realtime AI is that it makes presence feel like intelligence.

The opportunity is to let presence serve intelligence.

Choir should not become a talking robot. It should become a living artifact system that can speak.