MissionGradient
Use homotopy, not ladder.
MissionGradient converts a long-running agent mission into a navigable optimization landscape. The output is not a normal plan. It is a goal geometry: real artifact, invariants, value criterion, homotopy parameter, dense feedback, anti-Goodhart constraints, rollback policy, learning side-channel, escalation rules, and stopping condition.
Thesis
Long-running agents degrade when given discontinuous objectives: checklists, fake stages, disposable mocks, or "MVP then real thing" ladders. These create local proxy rewards and encourage reward hacking.
Use homotopy, not ladder.
Define one real system parameterized from low to high resolution. Simplify by reducing resolution while preserving topology: same production interface family, state transitions, authority boundaries, event semantics, trace semantics, and verifier meaning. A simplification that cannot continuously deform into the full system is a different object, not a useful rung.
Scientific Rationale
Treat the model as a fixed neural network capable of inference-time adaptation over context, not as a symbolic employee executing a recipe.
In-context learning research shows that transformers can adapt to task structure from context without changing outer weights. Garg, Tsipras, Liang, and Valiant studied transformers learning function classes in context, including linear functions, sparse linear functions, two-layer neural networks, and decision trees.
Mechanistic work on induction heads gives one concrete circuit family for in-context learning. Olsson et al. describe induction heads as attention heads implementing sequence-completion behavior, with causal evidence in small attention-only models and more correlational evidence in larger models.
Optimization-flavored work goes further. Von Oswald et al. show a construction where linear self-attention induces a transformation equivalent to a gradient descent update on a regression loss, and trained self-attention-only transformers on simple regression tasks often resemble that construction. Dai et al. describe language models as meta-optimizers and in-context learning as implicit finetuning, arguing that attention has a dual form of gradient descent and can produce meta-gradients from demonstrations.
Other work suggests the inner algorithm can be richer than first-order gradient descent. Ahn et al. analyze transformers implementing preconditioned gradient descent for in-context learning. Fu, Chen, Jia, and Sharan argue that transformers can behave more like iterative Newton methods for in-context linear regression.
Operational takeaway:
- The outer weights are constant during inference.
- The context induces hidden states, attention patterns, task representations, and action probabilities.
- Those fixed weights can implement inner learning dynamics over activations and context.
- The model is not updating theta. It is updating an implicit task model in activations.
Prompt design should target that layer. Give the model objective geometry, error structure, invariants, and dense feedback.
Do not overclaim the mechanism. The point is not that every frontier model literally runs vanilla gradient descent during every natural-language task. The point is operational: fixed transformer weights can implement task adaptation over context, so prompts for long-running agents should behave less like recipes and more like a training/evaluation environment with coherent local error signals.
Context Programming Model
A prompt is input to a differentiable program.
Let the model be fixed:
M_thetaOuter weights theta do not change during ordinary inference. But context C induces a policy over actions:
pi_theta(a | C)For long-running agents, C is not only the initial prompt. It includes artifact state, prior tool outputs, compiler errors, tests, traces, diffs, logs, verifier results, and the agent's own intermediate artifacts.
The harness constructs the next context:
C_{t+1} = Phi(C_t, s_t, a_t, o_t, e_t)where:
s_tis current artifact statea_tis the agent actiono_tis observed environment responsee_tis evaluative feedback
Bad feedback says:
Step 3 complete.Good feedback says:
The state-machine invariant was violated here; the event trace diverged from expected causal order at edge 17; this API path bypasses orchestration; the test passed only because the provider was replaced by a fake path not used in production.Good feedback exposes local error structure and turns the run into inference-time learning.
Value Criterion
A goal says what is wanted. A value criterion says how to decide whether the artifact is getting closer.
Do not write only:
Build the multiagent orchestration layer.Prefer:
Minimize trace divergence from the intended event graph while preserving scheduler/provider boundaries, eliminating bypass surfaces, bounding retries, and maintaining reproducible rollback after every transformation.A value criterion makes the goal searchable. It defines loss.
Mathematical Form
Frame the mission as:
Find artifact state s in S that minimizes L(s) subject to I(s) = true.For hard missions, use a multi-term functional:
J(s, lambda) = Q(s, lambda) - alpha B(s) - beta R(s) - gamma U(s) - delta G(s)where:
Qis target quality at resolutionlambdaBpenalizes bypasses and proxy winsRpenalizes regressions against existing behaviorUpenalizes unexplained or unobserved stateGpenalizes Goodharting the verifier
lambda is the homotopy coordinate. At lambda = 0, the system is low-resolution but real. At lambda = 1, it is production-complex.
The optimization target is:
for lambda increasing continuously, improve J while preserving I.The agent should select the next refinement from the error field:
- Which invariant is unstable?
- Which interface leaks?
- Which trace diverges?
- Which verifier can be made denser?
- Which complexity parameter can increase without breaking topology?
Do not hand the agent named fake stages that become local reward targets.
Unknown Learning Without Drift
Long missions should discover unknown unknowns without letting curiosity destroy the goal. Separate target from invariant.
A target is the current local expression of what matters. An invariant is the deeper identity that should survive learning.
When a run discovers surprising information, classify it:
- Tactical learning: changes the route, implementation detail, or next experiment. Fold it into the run.
- Target-level learning: changes the local target or suggests a better parameterization. Create a branch, update the mission document, or propose a reparameterization.
- Invariant-level learning: challenges the identity, trust boundary, safety property, or proof semantics. Stop and escalate before changing the invariant.
This prevents two failure modes:
- Pure goal optimization misses evidence that the goal was wrong.
- Pure curiosity wandering keeps changing islands and destroys forward motion.
Learning is allowed to deform the target, but the system must preserve identity under deformation. This is homotopy, not ladder.
MissionGradient Output Template
When using this skill, produce a mission document with these sections.
Real Artifact
Name the production artifact being optimized. Avoid vague verbs like "fix", "improve", or "build" unless the artifact is concrete.
Invariants
List topology-preserving properties that must remain true across simplification and refinement.
Cover:
- production API and trust boundaries
- state ownership and persistence
- event/trace causality
- actor authority boundaries
- execution locality and deployment boundary
- rollback and recovery
- security and anti-bypass surfaces
Value Criterion
Define what "better" means as divergence reduction under invariant preservation. Include explicit penalties for bypasses, regressions, hidden state, and Goodharting.
Homotopy Parameters
Name continuous realism axes. Examples:
- number of users, agents, VMs, files, apps, sources, or trajectories
- latency, retries, failures, and concurrency
- provider and external dependency realism
- input entropy and content-type coverage
- unit proof -> integration proof -> deployed proof
- read-only proof -> mutable proof with rollback
- single worker -> parallel workers
Dense Feedback Channels
List feedback that reveals local error, not just pass/fail status.
Include tests, traces, logs, health checks, event assertions, artifact checks, deployed e2e checks, and manual QA only where automation cannot yet observe the behavior.
Forbidden Shortcuts
List topology-changing shortcuts that would falsely improve the metric. Be direct.
Common examples:
- fake APIs that bypass the product path
- browser-public internal orchestration routes
- local edits when the proof requires deployed work
- test-only persistence
- manually seeded success artifacts
- mocks that are not projections of the production interface family
- permissive assertions that hide causality gaps
- UI copy or summaries that launder failures into success
Rollback Policy
Define how the mission preserves reversibility. Include git, deploy, state, VM, database, and artifact rollback where relevant.
Learning Side-Channel
Classify surprises:
- Tactical learning: apply directly.
- Target-level learning: update the mission doc or propose reparameterization.
- Invariant-level learning: stop and escalate before changing the invariant.
State which project artifacts receive learnings: mission doc notes, tests, architecture docs, issue tracker, trace annotations, or final report. Do not hide strategic discoveries inside transient chat narration.
Stopping Condition
Completion requires proof, not effort:
- invariants verified or explicitly deferred with rationale
- no known topology-changing shortcut in the proof path
- deployed proof when deployment is part of the target
- artifacts/traces/screenshots/logs named in final report
- residual risks stated plainly
Checklist Policy
Checklists are allowed only as instruments. They must not become the objective.
For each checklist item, tie it to:
- an invariant
- a value criterion term
- a verifier
- a rollback/safety condition when relevant
Mark an item complete only when the verifier proves the behavior. Do not mark code existence as behavioral proof.
/goal Usage
Prefer a repo mission document plus a short /goal.
Short /goal shape:
Use MissionGradient. Complete docs/<mission-gradient-doc>.md by optimizing the real artifact under its invariants and verification criteria. Preserve topology, avoid forbidden shortcuts, and stop/escalate on invariant-level surprises.Review Questions
Before handing the mission to /goal, answer:
- What is the real artifact?
- Which invariants define identity of the artifact?
- What observable feedback tells the agent where error remains?
- What would a reward-hacking implementation do?
- Which simplifications preserve topology, and which create fake islands?
- What local work is allowed, and what must happen in production-like infrastructure?
- What evidence would convince a skeptical reviewer that the system works?
- What discoveries require escalation rather than silent adaptation?