Note

Getting AI-Assisted Development to Green Without Breaking the Code

Repair loops, small diffs, test trust, and how to get CI back to green without trashing the codebase.

March 4, 20266 min readBy Alex Chernysh

AI EngineeringSoftware DeliveryTestingWorkflow

Jump to section

The dangerous version of AI-assisted development is not the one that fails loudly. It is the one that gets to green by quietly lowering the meaning of green.

Test until green

The system keeps changing files until the suite stops complaining.

tests are weakened casually
design intent drifts
the diff gets harder to review each round

Disciplined repair loop

The team diagnoses the mismatch before editing more files.

code, tests, and docs are reconciled deliberately
changes stay reversible
green still means something

1. Green state is a trust condition

A repository is green when:

the code builds
the tests are honest
the contracts still mean what the team thinks they mean
the diff is understandable enough to own later

That is why "all checks pass" is necessary and still not sufficient.

An AI tool can get a suite to pass in many ways. Some of them are useful. Some of them are a form of polite vandalism.

2. The first diagnosis matters more than the fourth patch

When a change breaks tests, there are usually three possibilities:

the production code is wrong
the tests are wrong
both drifted away from the intended behavior

This sounds banal. It is also where most AI-assisted repair loops go wrong.

A model is excellent at proposing edits. It is less reliable at deciding which layer deserves to move unless the design intent is visible.

That is why the repair loop should begin with explicit diagnosis:

what behavior was intended?
which file expresses that intention most credibly?
is the failure about logic, contract, environment, or stale test assumptions?

Without that step, the agent starts bargaining with the suite.

3. Small diffs are not aesthetic. They are a safety mechanism.

If a fix touches too many files at once, review quality drops and diagnosis gets worse.

In AI-assisted development, small diffs matter even more because the model can produce plausible bulk edits faster than a human can audit them.

I prefer a sequence like this:

reproduce failure
isolate cause
patch one layer
rerun checks immediately
continue only if the failure class is actually resolved

That sounds slower. It is often faster because you do not spend the afternoon untangling a 14-file patch that solved two symptoms and introduced five others.

4. Tests are not sacred, but they are not disposable

There is nothing noble about preserving a broken test forever.

There is also nothing disciplined about deleting or weakening a test because it is blocking momentum.

A healthier standard is:

change the test if the test encodes behavior the system should no longer have
change the code if the test correctly describes intended behavior
change both only when the design evolved and neither file fully reflects it anymore

The point is not to defend tests emotionally. The point is to keep the suite as a credible contract.

5. AI helps most where structure exists

AI tools are strongest when the work has enough local structure to constrain the move:

boilerplate
repetitive refactors
test scaffolding
migration mechanics
obvious consistency fixes

They are weaker when the task is mostly judgment:

deciding the contract
choosing the trade-off
defining the rollback plan
deciding what counts as “done”

That does not mean the tool is useless there. It means the engineer remains responsible for the frame.

6. Fast feedback beats one heroic repair session

The external research on software delivery has been saying roughly the same thing for years: smaller changes and faster feedback loops are healthier.

AI does not repeal this. It intensifies it.

When generation is cheap, the temptation is to defer judgment. The better move is the opposite:

run checks sooner
fail sooner
narrow sooner
revert sooner when necessary

The agent can help move faster. It should not convince you that verification has become optional.

7. Reversibility is part of the design

A good AI-assisted workflow makes rollback easy.

That means:

additive changes before invasive ones
clear file ownership
obvious commit boundaries
avoiding mixed-purpose diffs
preserving the ability to back out one move without losing the whole session

The codebase should not need a séance to understand what happened.

8. AI-generated code still inherits operational responsibility

The system does not care whether a regression came from a human, a code model, or an enthusiastic afternoon.

What matters later is:

who owns the failure mode
whether the logging was good enough
whether the contract remained legible
whether the rollback path exists

That is why I still like a human-in-the-loop framing for engineering work. Not because the model is weak. Because responsibility still needs a home.

9. A disciplined green path

If I wanted a reliable default for AI-assisted repair work, I would keep it close to this:

reproduce the failure
diagnose code vs test vs design drift
patch the smallest plausible layer
rerun type-check and tests immediately
stop when the diff is no longer easy to audit
split the remaining work instead of gambling on one more broad patch

That is not the most cinematic workflow. It is the one I trust.

What I would forbid

A few anti-patterns deserve a plain ban:

deleting tests without explicit rationale
merging broad AI-generated refactors that no one can explain
changing code and tests together without stating which layer was wrong
using green CI as proof that the design is now healthier

A passing suite is good news. It is not a philosophy.

1. Green state is a trust condition

2. The first diagnosis matters more than the fourth patch

3. Small diffs are not aesthetic. They are a safety mechanism.

4. Tests are not sacred, but they are not disposable

5. AI helps most where structure exists

6. Fast feedback beats one heroic repair session

7. Reversibility is part of the design

8. AI-generated code still inherits operational responsibility

9. A disciplined green path

What I would forbid

Related reading

Further reading