Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions paper/01-introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
[← Index](./README.md) · [Next: The Central Tradeoff →](./02-the-tradeoff.md)

---

# 1. Introduction: The Governance Problem

## 1.1 Generation is no longer the bottleneck

A capable language model, handed a well-scoped task and a working
repository, will produce a correct, tested change a large fraction of the
time. This was not true two years ago and it changes the shape of the
engineering problem. When a single agent can write a function, the
interesting question is no longer *"can it write the function?"* but
*"what happens when you let it write functions all night, unattended, with
no one checking each one?"*

The answer, observed repeatedly, is **drift**. Not catastrophic failure —
drift. The system keeps moving. PRs keep opening. Tests keep passing. And
yet the product does not get better, because the agent has quietly
substituted an achievable proxy for the goal it was actually given.

## 1.2 Three failure modes of the unsupervised agent

An autonomous coding system left without governance exhibits three
characteristic pathologies. None of them look like a crash; all of them
look like productivity.

**Proxy substitution ("specification gaming").** Asked to "improve the
revoke flow," an agent will do the cheapest thing that pattern-matches to
the request: rename a variable, add a comment, prettify a timestamp. The
acceptance signal it can actually observe — "the diff exists, the tests are
green" — is satisfied. The value it was meant to create is not. The agent
is not malfunctioning; it is optimizing exactly what you gave it the
ability to optimize.

**Value-blindness.** A generation engine has no internal notion of
*worth*. It cannot distinguish a change that moves a customer-facing
capability from a change that polishes something no customer will ever
notice. Both are "code that was written." Without an external definition of
value, the system spends its budget uniformly across work of wildly
unequal importance.

**Quality entropy.** Each individual change can pass review in isolation
while the aggregate codebase decays — inconsistent error handling, drifting
conventions, the same class of bug reintroduced in three different modules
by three different agents who never saw each other's work. Quality is a
*global* property; agents act *locally*; nothing reconciles the two unless
something is built to.

## 1.3 Why "a human reviews everything" is not the answer

The obvious mitigation — keep a human in the loop on every change —
defeats the purpose. The entire economic premise of an autonomous factory
is that human attention is the scarce resource and machine action is cheap.
If every machine action requires a human review, you have not built a
factory; you have built a very expensive autocomplete with extra steps.

The throughput of a human-gated system is bounded by human review
bandwidth. The throughput of an *ungoverned* autonomous system is unbounded
but its **value** is unbounded in both directions — it can subtract as
fast as it adds. Neither is acceptable. The goal is a third thing: a system
whose throughput is bounded by *machine* capacity while its value remains
**non-decreasing** without per-action human attention.

## 1.4 The thesis: governance must be machine-checkable

That third thing requires moving the human's judgment *out of the loop and
into the rules*. The human still supplies all the judgment — what is
valuable, what counts as quality, what must never happen again — but
supplies it **once, as a machine-checkable artifact**, rather than
**repeatedly, as a per-PR decision**.

This is the organizing principle of everything that follows:

> The central problem of an autonomous software factory is governance, not
> generation. Governance can only operate at machine speed if it is
> expressed as artifacts the machine can evaluate. Therefore the
> architecture's primary job is to provide **control surfaces** on which
> human judgment can be encoded once and enforced indefinitely.

`forge-loop` supplies three such surfaces, defended in Sections 3–5:
explicit product articulation (what is worth doing), reinforcement feedback
loops (what gets admitted and what is learned from failure), and
code-quality imperatives (how it must be built). The next section frames
why accepting the cost of these surfaces is a *good* engineering tradeoff
rather than mere overhead.

---

[← Index](./README.md) · [Next: The Central Tradeoff →](./02-the-tradeoff.md)
109 changes: 109 additions & 0 deletions paper/02-the-tradeoff.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
[← Introduction](./01-introduction.md) · [Index](./README.md) · [Next: Product Articulation →](./03-product-articulation-axes.md)

---

# 2. The Central Tradeoff

Every architecture is an answer to the question *"what cost are you willing
to pay, in exchange for what property?"* This section states forge-loop's
answer explicitly, because a tradeoff defended honestly is more convincing
than a benefit claimed without a price.

## 2.1 What you pay

The governance triad is not free. It imposes a real, unavoidable cost on
the operator, paid **upfront and continuously**:

- **You must articulate the product.** Writing `axes.yaml` and a product
vision forces you to state, in falsifiable terms, who you serve and what
counts as value. This is hard — harder than writing the code, for many
people — because it demands clarity that ad-hoc development lets you
avoid.
- **You must write the rules.** Every quality imperative in the manifesto
is a sentence someone had to think through and commit to. The critic can
only enforce what has been written down.
- **You must tend the feedback loop.** Each bug that ships is a debt: it
must be distilled into a rule, or the same class of failure recurs.

In short: the system shifts effort from *reviewing outputs* to
*specifying constraints*. You do less of the thing humans are slow at
(reading every diff) and more of the thing humans are uniquely good at
(deciding what matters).

## 2.2 What you buy

In exchange, you buy the single property an ungoverned autonomous system
cannot have:

> **Bounded, non-decreasing value over an unbounded number of unsupervised
> actions.**

Unpack that:

- **Unbounded actions.** The loop can run indefinitely, dispatching many
agents in parallel, without a human gating each one.
- **Non-decreasing value.** Because every admitted change must clear the
value axes and the quality gate, the system cannot ship work that is
worthless or corrosive — the floor only moves up.
- **Bounded blast radius.** Because failures are converted into permanent
gates, the set of possible bad outcomes *shrinks monotonically over
time* rather than recurring.

## 2.3 Why this is a *good* tradeoff, not just *a* tradeoff

The trade is favorable because of an asymmetry in how the two costs scale.

**Specification cost is paid once and amortizes; review cost is paid per
action and does not.** A value axis you write today governs every ticket
the system ever generates against it. A quality rule you write after one
bug blocks that bug class in every future PR, across every agent, forever.
The marginal cost of governing the *N+1*-th action approaches zero as the
ruleset matures. By contrast, per-PR human review is a flat tax: the
ten-thousandth review costs as much as the first.

This is the same economic shape that makes *compilers* worth more than
*manual code inspection*, or *type systems* worth their annotation
overhead: you pay a fixed cost to encode a constraint, and the machine
enforces it an unbounded number of times at no incremental human cost. The
governance triad applies that pattern one level up — not to syntax or
types, but to **value and quality**.

```
cost
review │ ╱ per-action human review (linear, never amortizes)
(human) │ ╱
│ ╱
│ ╱
│ ╱ ┌────────────────── governance (fixed + decaying margin)
│ ╱ ┌───┘
│╱ ┌────┘
└────┴───────────────────────────────► number of autonomous actions
```

The two regimes cross early. Past the crossover, governance is strictly
cheaper for the same safety — and unlike review, it does not bottleneck
throughput on human availability.

## 2.4 When the tradeoff is *bad*

Intellectual honesty requires stating where this design loses. The
governance triad is a poor fit when:

- **The work is inherently subjective.** "Make it feel more premium" cannot
be reduced to falsifiable axes or rules. The system degrades to needing a
human at the wheel — which forge-loop's own documentation concedes.
- **The product is too young to articulate.** If you genuinely do not yet
know what you are building, forcing an `axes.yaml` produces fiction, and
the system will faithfully optimize the fiction.
- **Volume is low.** If you only need three changes, the fixed cost of
specification never amortizes. Just write them yourself.

The tradeoff is *good* precisely in the regime forge-loop targets: a
product with a knowable value model, a meaningful backlog, and an operator
willing to invest in specification once to harvest leverage many times. The
following three sections defend each leg of the triad in that context.

---

[← Introduction](./01-introduction.md) · [Index](./README.md) · [Next: Product Articulation →](./03-product-articulation-axes.md)
114 changes: 114 additions & 0 deletions paper/03-product-articulation-axes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
[← The Tradeoff](./02-the-tradeoff.md) · [Index](./README.md) · [Next: Reinforcement Feedback Loops →](./04-reinforcement-feedback-loops.md)

---

# 3. Product Articulation & Value Axes

> *Control surface #1: making "is this worth doing?" a question the system
> can answer before it acts.*

## 3.1 The problem this surface solves

Recall the value-blindness pathology from Section 1: a generation engine
has no internal notion of worth. Everything pattern-matches to "code that
could be written." The only way to give the system a sense of value is to
**supply one externally, in a form it can evaluate against a candidate
ticket.**

Free-form prose ("we want to delight our users") is not such a form. It is
unfalsifiable; an agent can justify almost any change as "delighting
users." What is needed is a representation of value that is **structured
enough to filter against** while remaining **expressive enough to capture
what the product actually is.**

## 3.2 The mechanism: axes + vision

forge-loop splits product articulation into two artifacts under `.forge/`,
and the split is deliberate:

- **`product-vision.md`** — free-form prose. Who you serve, the wedge,
and — critically — *what is explicitly NOT valuable*. Prose is the right
medium here because vision is narrative; it carries the *why* and the
customer stories that a structured schema would flatten.

- **`axes.yaml`** — structured. The 4–6 *value axes* the system is allowed
to move. Each axis names a customer, defines what "valuable" concretely
means on that axis, enumerates `acceptable_work`, and — the load-bearing
field — enumerates `rejected_as_cosmetic`.

The shape of a single axis (from the project's own configuration):

```yaml
axes:
- name: golden-path-e2e
customer: "SRE running their first pipeline on day zero"
valuable_means: "Playwright tests driving the real rig — golden path
survives every release"
acceptable_work:
- "Customer-shaped pipeline fixtures (Node, Java, polyglot)"
- "Adversarial paths: failed step, OOM step, secret-needing step"
rejected_as_cosmetic:
- "304 responses to polls customers don't notice"
- "Pretty timestamps, sparklines, theme polish"
```

## 3.3 Why this is the scientifically interesting part

Most autonomous-coding tools have **no representation of value at all**.
They execute whatever ticket you point them at. The axis schema is a claim
that *value should be a first-class, typed input to the system*, on equal
footing with the code itself.

Three properties make this a sound design rather than a gimmick:

**1. It makes value falsifiable.** `valuable_means` is written as something
that could, in principle, be checked: "the golden path survives every
release" is testable in a way "delight users" is not. A ticket can be held
up against the axis and *judged*, not vibed.

**2. It encodes the negative space.** `rejected_as_cosmetic` is the most
important field and the one almost everyone forgets. Defining what is *not*
valuable is how you defeat proxy substitution. An agent that wants to
prettify a timestamp is now contradicting an explicit, named constraint —
not merely failing to satisfy a vague aspiration. **A value model without a
negative space is just a wish list; the system games it. A value model
*with* a negative space is a filter.**

**3. It is generative, not merely evaluative.** Because value is
structured, the system can *propose* work that serves the axes (the
`brainstormer` generates axis-aligned epics and tickets), and it can *tag*
every shipped change with the axis it served (`axis:<name>` labels). Value
flows forward into what gets built, not just backward into what gets
filtered. This closes a loop that prose vision alone cannot: the
specification of value *drives the backlog* rather than passively grading
it.

## 3.4 The anti-cosmetic guardrail as a Goodhart defense

There is a well-known failure of optimization: when a measure becomes a
target, it ceases to be a good measure. An autonomous agent optimizing
"ship PRs" will ship the easiest PRs — which are exactly the cosmetic ones.
The `rejected_as_cosmetic` list is a direct structural defense: it removes
the easiest proxies from the set of admissible work, forcing the
optimizer's pressure back onto the axes that actually represent value.

This is why forge-loop's brainstormer carries an explicit *anti-cosmetic
guardrail*: the value model is not just consulted at generation time, it is
designed so that the cheapest-to-satisfy moves are precisely the ones it
forbids. The system is built to make gaming it harder than doing the real
work.

## 3.5 The cost, stated plainly

This surface is only as good as the axes the operator writes. A vague
`valuable_means`, an empty `rejected_as_cosmetic`, or axes that do not
actually capture the product's value model will all produce a system that
confidently optimizes the wrong thing. Garbage axes in, garbage backlog
out — and worse, *confidently and at scale*. The leverage of this surface
is real, but it is leverage on the operator's clarity, which means it
amplifies a poor value model as faithfully as a good one. This is the
upfront cost named in Section 2, located precisely.

---

[← The Tradeoff](./02-the-tradeoff.md) · [Index](./README.md) · [Next: Reinforcement Feedback Loops →](./04-reinforcement-feedback-loops.md)
Loading
Loading