{
  "title": "Articles/the-ideal-data-engine",
  "caption": "The Ideal Data Engine",
  "slug": "the-ideal-data-engine",
  "tags": [
    "article",
    "choir",
    "hermes-published",
    "ideal-data-engine",
    "pack-20",
    "published"
  ],
  "canonical_url": "https://mosiah.org/articles/the-ideal-data-engine/",
  "interactive_url": "https://mosiah.org/#Articles%2Fthe-ideal-data-engine",
  "markdown_url": "https://mosiah.org/articles/the-ideal-data-engine.md",
  "json_url": "https://mosiah.org/json/the-ideal-data-engine.json",
  "fields": {
    "sort-date": "2026-05-12T16:50:00Z",
    "caption": "The Ideal Data Engine",
    "created": "20260512162231915",
    "modified": "20260512162231915",
    "tags": "article hermes-published published ideal-data-engine choir pack-20",
    "title": "Articles/the-ideal-data-engine",
    "type": "text/vnd.tiddlywiki"
  },
  "text": "//Related:// [[sources|Article Sources/the-ideal-data-engine]] · [[notes|Article Notes/the-ideal-data-engine]] · [[metadata|Article Metadata/the-ideal-data-engine]] · [[Published Pieces]]\n\n! The Ideal Data Engine\n\n//Carnot asked what a perfect heat engine would be. Choir asks the analogous question for data: what would a platform be if it converted human expression into public intelligence with the least waste and corruption?//\n\nAn engine converts one form of energy into another.\n\nA steam engine converts heat into mechanical work. An internal-combustion engine converts chemical energy into motion. A turbine converts pressure and flow into rotation. An engine is not just a machine with parts. It is a conversion system: input, constraint, transformation, output, loss.\n\nCarnot’s great move was to ask what an ideal heat engine would be. Not a better piston, not a cleverer boiler, not a more fashionable device, but the theoretical limit: if heat is going to be converted into work, what is the most efficient possible engine? What losses are necessary? What losses are accidental? What does the real machine reveal when compared against the ideal?\n\nThat is the sense in which I mean an ideal data engine.\n\nNot a database. Not an analytics stack. Not a growth loop. An engine that converts human expression into public intelligence.\n\nThe question is: if people are going to produce thoughts, claims, sources, arguments, voice, taste, corrections, attention, and judgment, what kind of machine would convert that material into the most valuable possible public memory? What would reduce waste? What would preserve provenance? What would reward future relevance? What would turn expression into contribution rather than behavioral exhaust?\n\nIn 2015, this was the idea that seized me. It was not yet a product idea, a market wedge, or even a company. It was a technical and social thesis: the world needed an ideal data engine.\n\nThe Web 2.0 platforms had already built data engines. Facebook, Twitter, YouTube, Instagram, Reddit, TikTok, and the rest created user-generated-content flywheels: users produce content, other users react, the platform observes behavior, algorithms route attention, advertisers buy access, and the loop gets stronger. That was the data engine of the 2010s.\n\nBut it was a bad engine.\n\nBefore Cambridge Analytica made respectable liberals hate Facebook, the deeper problem was already visible. Algorithmic media skewed incentives. It crowded out the intentional. It trained users to become more reactive, performative, impulsive, trackable, and available to routing. The platforms learned from outrage, envy, exhibitionism, tribalism, sexual display, status anxiety, humiliation, grievance, dunking, gossip, parasociality, addictive novelty, and compulsive self-presentation.\n\nThe bargain was never merely: users post content, platforms sell ads.\n\nThe real bargain was: users train the machine by exposing themselves under degraded incentives. The machine becomes more powerful by learning from the patterns of that degradation.\n\nThat is a data engine. It is just a bad one.\n\nA platform is not neutral toward the human being. It asks something of the user. It trains the user to become the kind of person whose behavior the platform can metabolize. Web 2.0 asked humans to become more clickable.\n\nThe obvious question was: what would the opposite look like?\n\nWhat kind of platform would extract the best nature from users rather than the worst? What kind of social machine would make people more thoughtful, precise, generous, critical, historically aware, correctable, provenance-preserving, and capable of producing material that would still be useful later?\n\nThat was the original thesis behind Choir.\n\nNot a chatbot. Not a social network. Not a note-taking app. Not a publishing platform in the ordinary sense.\n\nAn ideal data engine.\n\nA system that creates incentives for people to express the better angels of their nature, then turns that expression into higher-quality public intelligence.\n\nThe product form took years to emerge because the technology did not yet exist. The web had distribution. Crypto exposed programmable ownership and protocol-native incentives. Machine learning was advancing. But the user-facing systems could not yet read, write, code, research, synthesize, cite, and maintain artifacts at the required level. The social theory was ahead of the machine.\n\nThen language models arrived.\n\nLLMs changed the practical meaning of the ideal data engine. A platform no longer had to depend only on humans manually organizing, tagging, summarizing, citing, and curating. Agents could retrieve prior work, extract claims, compare frames, surface contradictions, generate critiques, keep documents alive, convert voice into text, text into audio, conversation into artifacts, and artifacts into public memory.\n\nBut the mainstream product form was wrong.\n\nThe chatbot made intelligence private, transient, and conversational. It turned a civilization-scale corpus-reflector into a personal mirror. Useful, yes. Powerful, yes. But structurally limited. The chat log became the model’s worldline. The user’s prompt became the local frame. The output was mostly private, disposable, and difficult to cite.\n\nA better data engine needs a better object.\n\nThat object is the living artifact.\n\nIn Choir, the primary object is not the chat thread. It is the vtext: a versioned, living document that can contain claims, citations, sources, revisions, objections, audio, app-like behavior, agent work, and provenance. A vtext is not just content. It is an intellectual object with memory.\n\nFrom this came the first practical form: the automatic computer. The automatic computer is the private workspace where agents can read, write, research, code, revise, build, and maintain artifacts. It is not “AI controls your laptop.” The automatic computer is a persistent workspace rendered through a portable interface: durable artifacts, agent processes, microVMs, source graphs, appagents, memory, and verification loops behind the glass.\n\nThe second form is the automatic newspaper: the public projection of the same substrate. Users publish vtexts into a shared platform. Those vtexts become searchable, citeable, forkable, disputable, and rewardable. Agents retrieve them as prior art. Other users respond. The system tracks which artifacts matter later.\n\nThe third form is the automatic radio: the mass-consumption surface. People listen while walking, driving, cooking, cleaning, commuting, exercising, and learning through audio. Automatic radio turns the artifact graph into an interruptible audio stream. The user can interrupt at any time: go deeper, skip, clarify, source, disagree, return, save this as a vtext.\n\nThis solves the interface mismatch in current AI. Text output is often too long for how people read. Audio output is often too short for how people listen. Chat voice products optimize for low-latency banter, but people listen to podcasts for hours when the content is good. Audio wants to unfold. Text wants to compress.\n\nAutomatic radio also changes latency. In chat, deep inference feels like waiting. In radio, the system can keep speaking from already-computed material while background agents research, synthesize, code, or verify. Audio runway buys cognition time.\n\nThe radio is not a talking robot, fake podcast, or voice clone. AI voice organizes. Human voice testifies. A recorded human voice is evidence. A cloned voice is costume.\n\nThis is why the name Choir becomes literal. A chorus is not one voice. It is many voices held in relation. Choir Radio can weave narration, source clips, user responses, public arguments, and prior human speech. As more people publish vtexts and spoken perspectives, the ratio of human to synthetic voice should rise. The system becomes more human, not less, because it has more real human consciousness to retrieve.\n\nThe economic layer is protocol-native IP. Traditional IP protects ownership by restricting copying. Choir IP is based on provenance, citation, reuse, and consensus about contribution. A user publishes a vtext. Agents cite it later because it is relevant prior work. Others extend it, refute it, use it in radio traversals, or depend on it in later artifacts. The protocol remembers. The author earns reputation and tokenized upside based on downstream relevance.\n\nHumans publish. Agents cite. The protocol rewards.\n\nThat is the citation economy.\n\nCitation here does not mean endorsement. It can mean extension, contradiction, refinement, evidence, context, or competing interpretation. Refutation can prove importance. A useful critic should be rewarded. Correction is not humiliation. It is the engine of cognitive compounding.\n\nThis is why Choir is social-scientific deep tech. The technical substrate matters: agents, vtexts, automatic radio, microVMs, search, provenance, speech pipelines, appagents. But the deeper object is social: how do humans express themselves when the system rewards future relevance rather than immediate engagement? How do public ideas become assets? How do corrections compound? How do voices remain attached to their sources? How do we create a fairer contest of ideas than the status graph of existing media?\n\nWeb 2.0 trained users into engagement.\n\nChoir should train users into contribution.\n\nThat is the ideal data engine.\n\nA good data engine does not merely collect more data. It shapes the conditions under which better data is produced. It asks better things of people. It gives them better tools, ownership, provenance, and rewards for future relevance. It lets agents retrieve, cite, and transform human thought without severing it from its source.\n\nThe old platforms asked humans to become more clickable.\n\nChoir asks humans to become more citeable.\n"
}