Concept
What is AI drift?Copy link
AI drift is the team-level discipline decay that sets in when AI-generated code outpaces the engineering practice around it. The codebase grows faster than the understanding of it, and the gap quietly compounds.
AI drift, definedCopy link
AI drift is what happens when a team takes the speed AI gives it and skips the engineering that used to come bundled with the typing. Code lands faster than anyone can reason about it. Schema decisions, edge-case handling, security posture, contract boundaries: all of these still get made, but the humans on the team stop making them. The agent makes a plausible choice in passing, the code compiles, the tests it wrote for itself go green, and the team ships. Six weeks later, nobody can answer simple questions about why the system behaves the way it does, because nobody decided. The agent did.
This is distinct from model drift, which is the MLOps term for a trained model’s performance degrading over time as production data diverges from training data. Model drift is a statistical phenomenon inside one running system. AI drift is a sociotechnical phenomenon across a team and a codebase. The two words sound the same and refer to entirely different problems. The MLOps version has owners, monitoring tools, and a decade of literature. The team-discipline version, which is the one breaking software shops in 2026, has almost none of that.
Why it happensCopy link
AI drift is what the eighty-twenty flip looks like when the bill arrives. AI made the cheap parts of building software cheaper still, and left the hard parts exactly as hard as they were. Teams that didn’t notice the flip kept pouring their effort into the typing, because the typing was what felt like work for their entire careers. The agent now does the typing. The team is still organising around it. The discipline that used to live alongside the typing, the requirement that got argued through, the AC that got written before the code, the review that caught the bad assumption, all of that got squeezed out of the schedule because the typing was so much faster than it used to be. Why slow down for ceremony when the code is already green?
The answer is that the ceremony was never ceremony. It was the part of the work that decided what got built and whether it was right. With the typing collapsed, that part is now the entire job. Teams that skip it ship twice the code with half the understanding, and the gap compounds with every feature added on top.
What it looks like in a codebaseCopy link
Drift shows up in patterns. None of them is new. The novelty is the speed.
Schema drift under autocomplete. The agent adds a new column, a new field, a new payload shape, because something downstream needed it. The change wasn’t designed. It was inferred from surrounding code and patched in. Three weeks later, four other services have started reading the new field with three slightly different interpretations of what it means. No PR review caught it because each change was small and locally plausible. The data model now contradicts itself in production, and reconstructing the original intent is archaeology.
Silently invented edge cases. The agent encountered an ambiguous input and made a choice. Empty string treated as null. Negative numbers clamped to zero. Unicode normalised one way for storage, another way for display. None of these are wrong in isolation. None of them was decided by a person. Six months later, when a customer reports a bug, the team discovers that their product has been silently making policy decisions for half a year, and nobody remembers what those policies are.
Tests that prove nothing. The agent wrote tests alongside the code. The tests are green. The tests assert what the code does, which is not the same thing as asserting what the code should do. An AI-written test against AI-written code is the agent marking its own homework. The CI suite hums; the product behaves badly. This is what acceptance criteria as the contract was always for, and what AI-assisted teams without that discipline keep reinventing the absence of.
Requirements that were never written down. A product owner described a feature in a Slack thread. The agent built it. The feature exists in the codebase, in the test suite, and in the heads of two engineers. It does not exist in any document that survives them. When the team changes, the feature’s reason for existing goes with the people who remember the Slack thread. Six months from now, somebody will argue the feature should be removed because it doesn’t look important, and nobody will be able to prove otherwise.
How to prevent AI drift in software projectsCopy link
You prevent AI drift by keeping the discipline that the typing used to carry. The agent writes the code. The team writes the requirements, the acceptance criteria, the contracts, and the chain that ties them together. The cycle is mechanical, not heroic.
Three pieces do most of the work. The first is traceability: every line of code traces to a test, every test to an acceptance criterion, every criterion to a story and a requirement. When the chain is in place, drift becomes visible. A piece of code with no AC behind it is a flag. An AC with no test is a flag. A test with no AC is decorative. The discipline catches drift early, when it’s still a small correction.
The second is the build cycle: five stages per feature, each one committing, none of them skippable. The cycle is what stops the agent shipping work that was never specified and never reviewed honestly. It also gives the team a regular cadence for noticing when the agent has wandered, because the Review stage is a separate stage, not a thing that happens in the same breath as the Build.
The third is acceptance criteria as the contract: the AC, written before the code, becomes the contract the test enforces and the agent works against. The agent can’t mark its own homework if the homework was set by the team and the marking is done against a test the team owns. This is the load-bearing primitive. Without it, the other two pieces have nothing to hold on to.
None of this is new advice. It’s what good engineering teams have always done, when they were allowed to. The new thing is that the activity around it has collapsed, and the discipline is now the whole job. Drift is the price of taking the speed and skipping the engineering. Methodology is what keeps the speed without paying it.
AI drift versus model drift, prompt drift, and LLM driftCopy link
“Drift” is a loaded word in 2026. Most of the search traffic around it lands on a different problem. Worth being explicit about which is which.
Model drift is the trained-model performance problem. A classifier trained on 2024 data starts misclassifying 2026 inputs because the world moved. The fix is retraining, monitoring, and the MLOps toolchain. Owners: ML engineers, data scientists.
Prompt drift and LLM drift are the agentic-system variants: the same prompt or the same model behaving differently across runs or across model versions, with downstream effects on agent reliability. The fix is evals, observability, and version-pinning. Owners: ML platform teams, agentic-systems engineers.
AI drift, in the sense this page is using, is the team-and-codebase problem. Discipline decay in AI-augmented software engineering. The fix is methodology: the chain, the cycle, the contract. Owners: the engineering organisation, the tech leads, the heads of engineering. The tools are documents and reviews, not monitoring dashboards.
All three are real. They share a word because the underlying intuition (something that worked is no longer working, and the deviation accumulates quietly) is the same. They share almost nothing else. If you arrived looking for the MLOps version, the canonical references live with the major MLOps vendors and Anthropic’s agent-engineering write-ups. If you arrived looking for the team-discipline version, the rest of the RCF methodology is what this page leads to.