Traceability, forward and backward, for AI-generated code

← RCF

Pick any line of code in any production codebase you’ve worked on. Ask: why does this line exist? On most codebases, the honest answer is some mix of “a developer thought it should,” “there was a meeting last quarter,” and “we’d have to ask Steve, but Steve left eight months ago.”

Traceability is the discipline of replacing that answer with a real one. A line of code traces to a test. The test traces to an acceptance criterion. The criterion traces to a user story. The story traces to a requirement. The requirement traces to a business decision. Each hop is a single lookup. The answer to why does this line exist is a one-step lookup, not a treasure hunt.

What is requirements traceability for AI-generated code?Copy link

Requirements traceability for AI-generated code is the ability to walk from any line of code back to the business decision that put it there, and forward from any decision to every test that proves it ships. The mechanics are identical to traceability in any other context. The reason it matters more now is that AI agents will happily generate code with no anchor to intent. Traceability is what supplies the anchor.

How do you keep AI-generated code from drifting?Copy link

You keep AI-generated code from drifting by giving the agent a contract it has to satisfy and a chain that records the satisfaction. Acceptance criteria written ahead of code. Test suites that map one-to-one to those criteria. Build stages that commit in order. Traceability is the spine that holds the whole thing together. Without it, the agent has nothing to drift away from and nothing that catches the drift when it happens.

Two directions, both load-bearingCopy link

Traceability runs in both directions, and each direction answers a different kind of question.

Forward. Start at the top of the chain and walk down. Given requirement REQ-019, what user stories does it have? Given those stories, what acceptance criteria? Given those criteria, what test suites? Given those suites, are they green? Forward tracing answers questions like is this requirement actually shipped, which tests do I need to run if this requirement changes, and what part of the product would I lose if I cut this requirement.

Backward. Start at a leaf and walk up. Given this failing test, what acceptance criterion does it cover? Given that criterion, what story does it belong to? Given the story, what requirement? Given the requirement, why does it exist in the PRD? Backward tracing answers questions like why does this code exist, which customer commitment does this test protect, and what business decision was being made when this constraint went in.

Forward tracing is what product owners and stakeholders ask. Backward tracing is what engineers and reviewers ask. The chain serves both. The cost of serving both, once the chain is in place, is roughly zero.

How the chain physically supports thisCopy link

Every artefact in the chain carries an opaque string ID. PRD-001. REQ-019. US-088. AC-088-02. TAD-001. FBS-051. TS-088-02. The IDs are stable: never renumbered, never reused, never reissued. Each artefact carries pointers to its neighbours: stories point at their parent requirement, ACs point at their parent story, suites point at their AC.

That’s it. The whole traceability story is “stable IDs and pointer fields.” The complexity people imagine in traceability systems doesn’t live in the data model. It lives in two operational disciplines that the data model enforces: never reuse an ID, and never let a pointer go stale.

Test suites tie back to ACs because the suite’s name or its metadata references the AC ID. Code ties back to test suites because the suite tests it. Code can also point at an AC directly with a code annotation, which is cheap and makes search-based traceability work without running the test runner. None of this needs a database. It needs the IDs to be stable strings that grep can find.

What forward tracing actually looks likeCopy link

Imagine a product owner asks: show me that user data export is implemented. With the chain in place, this is a three-step query.

Step one. Find the requirement. The PRD’s requirements list shows REQ-019, “Users can export their data,” functional, must-have, domain: privacy.

Step two. Pull every story for REQ-019. There’s one: US-088, “As a user I want to download my data so that I can take it elsewhere.”

Step three. Pull every AC on US-088, and find their test suites. AC-088-01 (happy path), AC-088-02 (size cap), AC-088-03 (format). Three suites, all green on the last CI run.

The product owner gets a one-page answer with a clear chain, current as of the last build. Compare to the alternative, where the product owner gets a meeting, five engineers in a room reverse-engineering the export feature, and a slide deck written overnight.

What backward tracing actually looks likeCopy link

A test suite goes red after a refactor. The test is TS-088-03, and the engineer who broke it didn’t write it and doesn’t remember what it’s for. With the chain in place, the question “why does this test exist” is a four-hop query.

TS-088-03 implements AC-088-03. AC-088-03 says exports must be in CSV or JSON, never proprietary formats. AC-088-03 belongs to US-088. US-088 is part of REQ-019. REQ-019 is in the PRD as a privacy must-have.

The engineer now knows three things they didn’t know two minutes ago. The test exists to protect a customer-facing privacy commitment. The refactor’s side effect breaks that commitment. The fix isn’t to delete the test. The fix is to restore the format guarantee in the refactored code. That decision takes seconds because the chain made the stakes explicit.

What changes when a requirement changesCopy link

Requirements change. They have to. The interesting question is what changing one does to everything else.

Without traceability, the answer is “nobody knows, but everyone has opinions.” The team huddles, somebody guesses which stories are affected, somebody else points at a probably-related test file, and the change ships with a vague footnote about “possible regressions in adjacent features.” Those footnotes are where production incidents come from.

With traceability, the answer is mechanical. Pull every story for the changed requirement. Pull every AC on those stories. Pull every test suite for those ACs. That’s the impact surface. Some suites will need new cases. Some will need to be deleted. Some will need their underlying AC restated. The change becomes a manageable list rather than a vibe.

What changes when an AC changesCopy link

ACs change too, often more than requirements do. The mechanics are the same. An AC has exactly one test suite. Change the AC, the test suite needs updating, and the change is contained.

The discipline matters here. When an AC changes, the suite should be updated first, then the implementation. Update the suite to reflect the new criterion (which probably makes some cases fail). Make the implementation pass against the new cases. Commit both. The order is what keeps the chain honest. Update the implementation first and you risk shipping code that satisfies the old AC while the new AC is still aspirational.

The limits of traceabilityCopy link

Traceability proves a chain from a business decision to a passing test. It doesn’t prove the test is the right test. If the AC is weak, the suite will be weak, and the chain will hum quietly while the product behaves badly in production. The chain is necessary, but it isn’t enough on its own. Weak ACs produce weak suites, and a humming chain over a broken product. The chain only works if what flows through it is good.

Traceability also doesn’t cover side effects. A line of code that satisfies its AC may also touch something it shouldn’t. The build cycle catches some of that in the Test stage, when the whole suite runs and unrelated tests catch the bleed. Some of it gets caught in code review. Some of it gets caught in production. The chain narrows the search; it doesn’t eliminate the work.

And traceability is only as good as the discipline that maintains it. An ID that gets renumbered breaks every pointer that referenced it. A story without a requirement is an orphan. An AC without a suite is decorative. RCF projects keep a small set of validators that run on commit and flag this kind of breakage early. The validators aren’t glamorous. They’re what makes traceability survive a year of edits.

What traceability cashes out as, day to dayCopy link

Six months into a project, when nobody can remember why anything was decided, the chain is what stops the team rebuilding things they already built. A year in, when the product owner shows up asking for evidence, the chain is what saves the all-hands meeting. Two years in, when somebody asks whether a deprecation is safe, the chain is what tells you which downstream tests would break, and which customer commitments those tests defend.

Without it, all of those questions get answered by guesswork dressed up as seniority. With it, they get answered by query.