# The Ideal Data Engine

Canonical: https://mosiah.org/articles/the-ideal-data-engine/
Interactive: https://mosiah.org/#Articles%2Fthe-ideal-data-engine

//Related:// [[sources|Article Sources/the-ideal-data-engine]] · [[notes|Article Notes/the-ideal-data-engine]] · [[metadata|Article Metadata/the-ideal-data-engine]] · [[Published Pieces]]

! The Ideal Data Engine

//Carnot asked what a perfect heat engine would be. Choir asks the analogous question for data: what would a platform be if it converted human expression into public intelligence with the least waste and corruption?//

An engine converts one form of energy into another.

A steam engine converts heat into mechanical work. An internal-combustion engine converts chemical energy into motion. A turbine converts pressure and flow into rotation. An engine is not just a machine with parts. It is a conversion system: input, constraint, transformation, output, loss.

Carnot’s great move was to ask what an ideal heat engine would be. Not a better piston, not a cleverer boiler, not a more fashionable device, but the theoretical limit: if heat is going to be converted into work, what is the most efficient possible engine? What losses are necessary? What losses are accidental? What does the real machine reveal when compared against the ideal?

That is the sense in which I mean an ideal data engine.

Not a database. Not an analytics stack. Not a growth loop. An engine that converts human expression into public intelligence.

The question is: if people are going to produce thoughts, claims, sources, arguments, voice, taste, corrections, attention, and judgment, what kind of machine would convert that material into the most valuable possible public memory? What would reduce waste? What would preserve provenance? What would reward future relevance? What would turn expression into contribution rather than behavioral exhaust?

In 2015, this was the idea that seized me. It was not yet a product idea, a market wedge, or even a company. It was a technical and social thesis: the world needed an ideal data engine.

The Web 2.0 platforms had already built data engines. Facebook, Twitter, YouTube, Instagram, Reddit, TikTok, and the rest created user-generated-content flywheels: users produce content, other users react, the platform observes behavior, algorithms route attention, advertisers buy access, and the loop gets stronger. That was the data engine of the 2010s.

But it was a bad engine.

Before Cambridge Analytica made respectable liberals hate Facebook, the deeper problem was already visible. Algorithmic media skewed incentives. It crowded out the intentional. It trained users to become more reactive, performative, impulsive, trackable, and available to routing. The platforms learned from outrage, envy, exhibitionism, tribalism, sexual display, status anxiety, humiliation, grievance, dunking, gossip, parasociality, addictive novelty, and compulsive self-presentation.

The bargain was never merely: users post content, platforms sell ads.

The real bargain was: users train the machine by exposing themselves under degraded incentives. The machine becomes more powerful by learning from the patterns of that degradation.

That is a data engine. It is just a bad one.

A platform is not neutral toward the human being. It asks something of the user. It trains the user to become the kind of person whose behavior the platform can metabolize. Web 2.0 asked humans to become more clickable.

The obvious question was: what would the opposite look like?

What kind of platform would extract the best nature from users rather than the worst? What kind of social machine would make people more thoughtful, precise, generous, critical, historically aware, correctable, provenance-preserving, and capable of producing material that would still be useful later?

That was the original thesis behind Choir.

Not a chatbot. Not a social network. Not a note-taking app. Not a publishing platform in the ordinary sense.

An ideal data engine.

A system that creates incentives for people to express the better angels of their nature, then turns that expression into higher-quality public intelligence.

The product form took years to emerge because the technology did not yet exist. The web had distribution. Crypto exposed programmable ownership and protocol-native incentives. Machine learning was advancing. But the user-facing systems could not yet read, write, code, research, synthesize, cite, and maintain artifacts at the required level. The social theory was ahead of the machine.

Then language models arrived.

LLMs changed the practical meaning of the ideal data engine. A platform no longer had to depend only on humans manually organizing, tagging, summarizing, citing, and curating. Agents could retrieve prior work, extract claims, compare frames, surface contradictions, generate critiques, keep documents alive, convert voice into text, text into audio, conversation into artifacts, and artifacts into public memory.

But the mainstream product form was wrong.

The chatbot made intelligence private, transient, and conversational. It turned a civilization-scale corpus-reflector into a personal mirror. Useful, yes. Powerful, yes. But structurally limited. The chat log became the model’s worldline. The user’s prompt became the local frame. The output was mostly private, disposable, and difficult to cite.

A better data engine needs a better object.

That object is the living artifact.

In Choir, the primary object is not the chat thread. It is the vtext: a versioned, living document that can contain claims, citations, sources, revisions, objections, audio, app-like behavior, agent work, and provenance. A vtext is not just content. It is an intellectual object with memory.

From this came the first practical form: the automatic computer. The automatic computer is the private workspace where agents can read, write, research, code, revise, build, and maintain artifacts. It is not “AI controls your laptop.” The automatic computer is a persistent workspace rendered through a portable interface: durable artifacts, agent processes, microVMs, source graphs, appagents, memory, and verification loops behind the glass.

The second form is the automatic newspaper: the public projection of the same substrate. Users publish vtexts into a shared platform. Those vtexts become searchable, citeable, forkable, disputable, and rewardable. Agents retrieve them as prior art. Other users respond. The system tracks which artifacts matter later.

The third form is the automatic radio: the mass-consumption surface. People listen while walking, driving, cooking, cleaning, commuting, exercising, and learning through audio. Automatic radio turns the artifact graph into an interruptible audio stream. The user can interrupt at any time: go deeper, skip, clarify, source, disagree, return, save this as a vtext.

This solves the interface mismatch in current AI. Text output is often too long for how people read. Audio output is often too short for how people listen. Chat voice products optimize for low-latency banter, but people listen to podcasts for hours when the content is good. Audio wants to unfold. Text wants to compress.

Automatic radio also changes latency. In chat, deep inference feels like waiting. In radio, the system can keep speaking from already-computed material while background agents research, synthesize, code, or verify. Audio runway buys cognition time.

The radio is not a talking robot, fake podcast, or voice clone. AI voice organizes. Human voice testifies. A recorded human voice is evidence. A cloned voice is costume.

This is why the name Choir becomes literal. A chorus is not one voice. It is many voices held in relation. Choir Radio can weave narration, source clips, user responses, public arguments, and prior human speech. As more people publish vtexts and spoken perspectives, the ratio of human to synthetic voice should rise. The system becomes more human, not less, because it has more real human consciousness to retrieve.

The economic layer is protocol-native IP. Traditional IP protects ownership by restricting copying. Choir IP is based on provenance, citation, reuse, and consensus about contribution. A user publishes a vtext. Agents cite it later because it is relevant prior work. Others extend it, refute it, use it in radio traversals, or depend on it in later artifacts. The protocol remembers. The author earns reputation and tokenized upside based on downstream relevance.

Humans publish. Agents cite. The protocol rewards.

That is the citation economy.

Citation here does not mean endorsement. It can mean extension, contradiction, refinement, evidence, context, or competing interpretation. Refutation can prove importance. A useful critic should be rewarded. Correction is not humiliation. It is the engine of cognitive compounding.

This is why Choir is social-scientific deep tech. The technical substrate matters: agents, vtexts, automatic radio, microVMs, search, provenance, speech pipelines, appagents. But the deeper object is social: how do humans express themselves when the system rewards future relevance rather than immediate engagement? How do public ideas become assets? How do corrections compound? How do voices remain attached to their sources? How do we create a fairer contest of ideas than the status graph of existing media?

Web 2.0 trained users into engagement.

Choir should train users into contribution.

That is the ideal data engine.

A good data engine does not merely collect more data. It shapes the conditions under which better data is produced. It asks better things of people. It gives them better tools, ownership, provenance, and rewards for future relevance. It lets agents retrieve, cite, and transform human thought without severing it from its source.

The old platforms asked humans to become more clickable.

Choir asks humans to become more citeable.