walkingtodo.ai
Method

The gate, the ledger, the record.

Three mechanisms, one rule under all of them: the generator never grades its own homework. Validators are deterministic checks or a human judge — never the model that produced the work — and every decision lands in a signed, dated record you keep.

The gate

Route by reversibility, not importance.

[auto]

The mechanical goes to machines.

Reversible, machine-checkable work — structure validation, integrity diffs, formatting rules — is verified by deterministic checks on every run. Cheap, fast, and caught by code.

[judge]

The consequential goes to a human.

Anything costly, irreversible, or invisible to mechanical checks is gated by a named human judge — never a rubber stamp by design. Every external action passes the gate; the human is the outbound channel.

The ledger

Every correction becomes a rule.

The Standards Ledger is a version-controlled record of judgment: every human correction becomes a one-line rule, tagged [auto] if a machine can verify it or [judge] if a human must. Rules, not fine-tuning — legible, auditable, portable across whatever model is best next quarter, and it lives in your repo. A real-shaped excerpt is on the Proof page; the argument for why this is the durable asset is the thesis.

The record

Chain of custody for AI output.

The deliverable is a signed record, not a live agent: what was checked, against which standard version, by whom, on what date — the document that answers “prove what your standard was when you made this call.” Chain of custody is the precise term, and it is the house vocabulary on purpose: evidence, judgment, records. See the artifact itself.

The honesty section

A thinking tool, not a safety case.

Two claims this method deliberately does not make. First: this is not certainty. The framework is a thinking tool, not a safety case. We engineer for failure decorrelation — layers that fail differently — not for perfect independence, which cannot be bought or asserted. The engineering argument is its own field note.

Second: a human in the loop is not automatically an improvement. The meta-analytic evidence (Vaccaro, Almaatouq & Malone 2024, 100+ experiments) finds the average human-AI combination underperforms the better of the two alone. We cite that study against ourselves because it defines the design problem: naive oversight fails, so the gates here are engineered — independent context, deterministic checks where rules exist, bounded judge workload, and a record that makes rubber-stamping visible. For what it’s worth, the firms getting value run this way: AI high performers report defined validation processes at 65% versus 23% of everyone else (McKinsey, Nov 2025, N=1,993, vendor-sponsored).

Provenance

Borrowed from disciplines that learned it the hard way.

The two-person rule

Nuclear surety's fifty-year-old answer to single-point judgment failure: no individual may both initiate and validate a critical action. Cited as engineering precedent for separating generator from validator.

The after-action review

Blameless, structured, recorded learning from every miss — the precedent for the rejection flywheel that feeds the ledger, alongside Dekker's Just Culture work on why blame kills reporting.

Chain of custody

The judicial standard for evidence that must survive scrutiny: documented possession, dated transfers, signatures. Adopted here as the native vocabulary for the signed record.

Defense in depth, decorrelated

Reliability engineering's oldest result: redundancy without diversity fails under common cause. The validation stack is built so the layers fail differently — the swiss-cheese argument, in full.

See it applied to one workflow.

The diagnostic runs this method against a single workflow with a clean metric — and can honestly end in “don’t proceed.”