Quick tour
The whole methodology in one readCopy link
Fifteen minutes. Enough to decide whether to keep reading. The problem, the idea, the documents, the cycle, and a worked example you can poke at.
The problem RCF is forCopy link
AI coding agents are extraordinary at the first eighty percent of a build. Plausible structure, fluent code, sensible patterns, faster than any team you’ve ever worked on. Then they stop being useful at the part where it matters. The part where a real product needs a real shape and real behaviour, and where you’d like some evidence that the shape and the behaviour are what was actually requested.
The first symptom is familiar. The demo runs. The screens look right. The thing compiles, deploys, and clicks through. Underneath, the agent has made a hundred decisions you never made. Edge cases. Scaling assumptions. Security posture. Schema design. Each one quietly wrong in a way that surfaces six weeks later as a bug, an outage, or a stakeholder’s question.
The deeper problem is older. Most teams have always shipped on the strength of the developer’s memory of an unwritten conversation. Requirements get written late, acceptance criteria get written later (if at all), tests get written by whoever has the time, and the trail from a business decision to a deployed line of code is a story people tell each other, not a thing the system records. That was bearable when humans wrote the code. It isn’t bearable when an agent does.
The core ideaCopy link
Anchor the work in requirements. Make them load-bearing. Break each one into user stories, each story into acceptance criteria, each criterion into a test, and let code be the cheap thing that arrives last to make the tests pass. The shape of the project is then a chain, and every node on the chain points up to the one that justifies it and down to the one that proves it.
That’s it. Everything else is operational detail.
What this buys you is small but rare in practice. When somebody asks why does this function exist, there’s an answer. When somebody asks have we shipped requirement REQ-014, there’s an answer. When the AI agent builds something subtly wrong, the wrongness shows up against a test that came from a criterion that came from a story that came from a requirement an actual human signed off. That’s the AI trust gap, and it’s why RCF doesn’t ask you to trust the agent. The chain is what gets trusted.
The document chainCopy link
Nine layers. The names matter less than what each one does.
A PRD, the product requirements document, sits at the top. It says what the product is, who it’s for, what it does, and what it deliberately doesn’t do. Out-of-scope is as load-bearing as in-scope. The PRD owns the requirements themselves, short crisp statements of what the product must do, with priority and category.
Each requirement breaks down into user stories, the “as a user I want X so that Y” shape. Each story carries one or more acceptance criteria, the Given/When/Then test sentences that say what “done” looks like. ACs are the central primitive. Everything downstream maps to them, one to one.
Alongside that, the technical architecture document describes the systems, components, data stores and integrations the product needs. One logical artefact, but under the hood a TAD usually bundles whatever it needs to get the architecture across: written sections, diagrams, wireframes, schema definitions, architecture decision records. The boundary is logical, not file-based. It exists so the next layer down has somewhere to point when it talks about real code.
The build sequence is the plan for getting from nothing to shipped. It’s a directed acyclic graph of functional build specifications, one per shippable slice, with dependencies declared up front. Each functional build specification carries the scope (which stories, which ACs), the context (which PRD sections, which TAD components, which modules and schemas the worker will need), the relevant business patterns, standards, style guidelines, and instructions that govern how the slice is built, and the testable outcomes that say when the slice is done.
At the bottom of the chain, each acceptance criterion gets a test suite, and each suite is made up of test cases. The one-to-one rule lives here. One AC, one test suite, no exceptions. If you can’t write a suite for an AC, the AC isn’t a real AC.
Reading top to bottom: PRD → Requirements → User Stories → Acceptance Criteria → Technical Architecture → Build Sequence → Functional Build Specification → Test Suite → Test Case. Read in detail in The document chain.
The build cycleCopy link
Each functional build specification runs through five stages, in order. Each stage commits.
Define. Write the tests for every acceptance criterion in scope, from the criterion text, before any production code exists. The tests fail. They should. Commit.
Build. Write the code that makes the tests pass. No more, no less. Build context comes from the FBS’s declared sources, not from whatever the worker fancies grepping for. Commit.
Review. Step away from the code. Read the diff. Check the tests actually exercise the criteria they claim to. Check no scope crept in. Check the thing matches the technical architecture. Commit any review-driven changes.
Test. Run the suites. All of them, not just the new ones. The green bar is the whole project, not the slice. If a test elsewhere goes red, something in the slice broke something it shouldn’t have, and the slice isn’t done. Commit fixes.
Finalise. CI green. Coverage report attached. FBS status flipped to complete. The slice is shippable. Commit.
Five commits, each of which a senior engineer could review on its own, each of which moves the project forward by a known amount. The cycle is repetitive on purpose. Repeated discipline is how you survive shipping with AI agents in the loop.
A worked exampleCopy link
A SaaS app needs a user-uploaded profile photo. Take it through the chain.
In the PRD, the requirement reads: REQ-014: Users can upload a profile photo to personalise their account. Functional, must-have, domain identity.
One user story for the happy path: US-042: As a signed-in user, I want to upload a profile photo, so that my account shows my face.
Three acceptance criteria on the story:
- AC-042-01. Given a signed-in user, when they upload a JPG or PNG under 5 MB, then the photo is stored and shown on their profile within two seconds.
- AC-042-02. Given a signed-in user, when they upload a file over 5 MB or in an unsupported format, then the upload is rejected with a clear reason and no partial state is persisted.
- AC-042-03. Given a signed-in user with an existing photo, when they upload a new one, then the old photo is replaced and removed from storage within a minute.
The TAD already covers the identity service and the object store, so the architecture work here is a one-line update to the identity service’s upload-handling component.
The build sequence places a functional build specification, FBS-027: Profile photo upload, after FBS-009 (the identity service stand-up) and before FBS-031 (the public profile page). Its storyScope is { US-042: [AC-042-01, AC-042-02, AC-042-03] }. Its testableOutcomes are the three ACs, restated.
In Define, the worker writes three test suites (one per AC), each with the cases needed to drive the criterion. Happy path, every named edge case, the failure modes. The suites compile. They fail.
In Build, the worker writes the upload handler, the validation, the storage write, the old-photo cleanup. The tests go green one by one.
In Review, you find that AC-042-03 is passing because the worker added a 60-second sleep to the test, not because the cleanup is fast. The worker fixes the implementation. The test passes honestly.
In Test, all suites on the project run. Everything green. CI green. FBS-027 closes.
Six months later, a security review asks: where does the old photo go when it’s replaced? The answer is in AC-042-03, the test that proves it, and the code that implements it. The chain holds.
When the spec changesCopy link
Product owners don’t know what they want until they see it. Some weeks later, the PO realises the upload needs to handle iPhone HEIC files, which the original AC-042-01 didn’t mention. The AC is wrong. So the AC gets edited, or a new AC-042-04 is added. Either way, the spec moves.
Traceability surfaces the gap immediately: the test suite under the changed AC no longer asserts what the spec demands, and the code under the test suite isn’t implementing what the new test would assert. The gap is visible because the chain runs on stable IDs and the IDs make the misalignment a query, not a guess.
The gap becomes the next functional build specification. FBS-061: Photo upload HEIC support, scoped to the new or amended AC, declares its build context, runs through the same five-stage cycle as the original. The methodology doesn’t have a separate workflow for change. The unit of work is the FBS regardless of whether it’s a new feature, an edge case, a bug, or a whole new module. The size of the diff varies. The shape of the work stays the same. That property is mostly what the methodology is for. The living spec page goes deeper.
What this doesn’t yet coverCopy link
RCF, as it stands today, runs from intent down to a test-passing slice. The downstream parts of shipping software, the bits between “tests are green” and “the customer is using it,” are not in the methodology yet. PR review discipline. CI/CD gates. Deploy practice. Post-deploy observability against the same acceptance criteria the slice was built against. These are real, and half the value of traceability evaporates if the last mile of getting code in front of users isn’t inside the chain too.
The plan is to bring them in. They’re the natural next layer below the build cycle, sharing the same vocabulary (FBSs, ACs, IDs) and the same discipline (each stage commits, each commit reviewable on its own terms). The site will be updated when that work lands.
Where to go from hereCopy link
That’s the whole shape. The rest of this section deepens the ideas one at a time.
For the why: Requirements as the source of truth. For the mechanism: Traceability, forward and backward. For the discipline that makes it stick: The build cycle. For iteration as first-class work: The living spec. For the strategic argument about why this matters now in a way it didn’t five years ago: The eighty-twenty flip.