Back to notes

Note

Getting AI-Assisted Development to Green Without Breaking the Code

Repair loops, small diffs, test trust, and how to get CI back to green without trashing the codebase.

6 min readBy Alex Chernysh
AI EngineeringSoftware DeliveryTestingWorkflow

The dangerous version of AI-assisted development is not the one that fails loudly. It is the one that gets to green by quietly lowering the meaning of green.

Test until green

The system keeps changing files until the suite stops complaining.

  • tests are weakened casually
  • design intent drifts
  • the diff gets harder to review each round

Disciplined repair loop

The team diagnoses the mismatch before editing more files.

  • code, tests, and docs are reconciled deliberately
  • changes stay reversible
  • green still means something

1. Green state is a trust condition

A repository is green when:

  • the code builds
  • the tests are honest
  • the contracts still mean what the team thinks they mean
  • the diff is understandable enough to own later

That is why "all checks pass" is necessary and still not sufficient.

An AI tool can get a suite to pass in many ways. Some of them are useful. Some of them are a form of polite vandalism.

2. The first diagnosis matters more than the fourth patch

When a change breaks tests, there are usually three possibilities:

  1. the production code is wrong
  2. the tests are wrong
  3. both drifted away from the intended behavior

This sounds banal. It is also where most AI-assisted repair loops go wrong.

A model is excellent at proposing edits. It is less reliable at deciding which layer deserves to move unless the design intent is visible.

That is why the repair loop should begin with explicit diagnosis:

  • what behavior was intended?
  • which file expresses that intention most credibly?
  • is the failure about logic, contract, environment, or stale test assumptions?

Without that step, the agent starts bargaining with the suite.

3. Small diffs are not aesthetic. They are a safety mechanism.

If a fix touches too many files at once, review quality drops and diagnosis gets worse.

In AI-assisted development, small diffs matter even more because the model can produce plausible bulk edits faster than a human can audit them.

I prefer a sequence like this:

  1. reproduce failure
  2. isolate cause
  3. patch one layer
  4. rerun checks immediately
  5. continue only if the failure class is actually resolved

That sounds slower. It is often faster because you do not spend the afternoon untangling a 14-file patch that solved two symptoms and introduced five others.

4. Tests are not sacred, but they are not disposable

There is nothing noble about preserving a broken test forever.

There is also nothing disciplined about deleting or weakening a test because it is blocking momentum.

A healthier standard is:

  • change the test if the test encodes behavior the system should no longer have
  • change the code if the test correctly describes intended behavior
  • change both only when the design evolved and neither file fully reflects it anymore

The point is not to defend tests emotionally. The point is to keep the suite as a credible contract.

5. AI helps most where structure exists

AI tools are strongest when the work has enough local structure to constrain the move:

  • boilerplate
  • repetitive refactors
  • test scaffolding
  • migration mechanics
  • obvious consistency fixes

They are weaker when the task is mostly judgment:

  • deciding the contract
  • choosing the trade-off
  • defining the rollback plan
  • deciding what counts as “done”

That does not mean the tool is useless there. It means the engineer remains responsible for the frame.

6. Fast feedback beats one heroic repair session

The external research on software delivery has been saying roughly the same thing for years: smaller changes and faster feedback loops are healthier.

AI does not repeal this. It intensifies it.

When generation is cheap, the temptation is to defer judgment. The better move is the opposite:

  • run checks sooner
  • fail sooner
  • narrow sooner
  • revert sooner when necessary

The agent can help move faster. It should not convince you that verification has become optional.

7. Reversibility is part of the design

A good AI-assisted workflow makes rollback easy.

That means:

  • additive changes before invasive ones
  • clear file ownership
  • obvious commit boundaries
  • avoiding mixed-purpose diffs
  • preserving the ability to back out one move without losing the whole session

The codebase should not need a séance to understand what happened.

8. AI-generated code still inherits operational responsibility

The system does not care whether a regression came from a human, a code model, or an enthusiastic afternoon.

What matters later is:

  • who owns the failure mode
  • whether the logging was good enough
  • whether the contract remained legible
  • whether the rollback path exists

That is why I still like a human-in-the-loop framing for engineering work. Not because the model is weak. Because responsibility still needs a home.

9. A disciplined green path

If I wanted a reliable default for AI-assisted repair work, I would keep it close to this:

  1. reproduce the failure
  2. diagnose code vs test vs design drift
  3. patch the smallest plausible layer
  4. rerun type-check and tests immediately
  5. stop when the diff is no longer easy to audit
  6. split the remaining work instead of gambling on one more broad patch

That is not the most cinematic workflow. It is the one I trust.

What I would forbid

A few anti-patterns deserve a plain ban:

  • deleting tests without explicit rationale
  • merging broad AI-generated refactors that no one can explain
  • changing code and tests together without stating which layer was wrong
  • using green CI as proof that the design is now healthier

A passing suite is good news. It is not a philosophy.

Further reading