Thesis 25 June 2026 8 min read

AI lowered the cost of code.
It raised the price of bugs.

For twenty years the expensive part of shipping software was writing it. That era is ending. The code is getting cheap. What's getting expensive is knowing it works — and that bill is coming due on the wire, not in the pull request.

Something quietly inverted over the last two years. A backend engineer used to spend most of a feature's budget typing: wiring the client, handling the auth dance, mapping the response, getting the retry logic right. The integration was the work. Now a model writes a plausible version of all of it before your coffee is cool. The client, the auth, the mapping, the retries — generated, formatted, and passing the tests it also wrote.

This is genuinely good. It is also the moment the bottleneck moved, and most teams haven't noticed where it went.

The cost curve flipped

When writing code was slow, it acted as a natural throttle on how much untested behavior entered the system per week. You couldn't ship what you couldn't type, and typing forced you to think about each line at least once. Slow was a feature nobody asked for and everybody benefited from.

Remove the throttle and the volume of behavior entering production goes up sharply — while the number of humans who have actually read any given line goes down. The integration that used to take three days and get three careful reviews now takes an afternoon and gets a skim. The code is not worse, on average. There is just a great deal more of it, understood by fewer people, arriving faster.

Code review shows intent.
The wire shows behavior.

A review answers "does this look like it does the right thing?" It is a reading of intent. That was often good enough when a human wrote the code, because the writing and the intent were the same act. When a model writes it, intent and behavior come apart. The code looks like it handles the timeout. Whether it actually does — whether the socket closes cleanly, whether the retry re-sends the idempotency key, whether the malformed response crashes the parser or is swallowed silently — is a question you can only answer by watching bytes move.

Models are trained on the happy path

Here is the structural problem, and it isn't going away with the next model.

Language models learn from code that exists. Code that exists is, overwhelmingly, code that works — it compiled, it shipped, it got committed. So models are exquisitely good at generating the request the server expects, the fields the SDK exposes, the sequence the protocol documents. They generate clients that respect the spec, because respecting the spec is what almost all of their training data does.

What they don't do, unprompted, is send a CRLF in a header value to see what your reverse proxy does with it. They don't send an SMTP DATA command before MAIL FROM to check whether your server enforces state. They don't truncate the LDAP message mid-field, or set a content-length that lies, or reorder the TLS handshake. That behavior is barely in the training set, because barely anyone commits it — it lives in engagement notes, in fuzzing corpora, in the heads of people who break things for a living.

Which means the adversarial input, the undocumented protocol corner, the deliberately broken envelope — the exact inputs that find the bug that takes production down at 3am — remain a human's job. Not because humans are smarter than the model. Because the interesting inputs are, by definition, the ones the model was never shown.

An AI happily writes the happy path.
The unhappy path is the human's job again.

The only artefact that proves what shipped

If more behavior is entering production, understood by fewer people, then the tests that exercise the wire — not the function signature — become the most valuable thing you own. Not because testing is virtuous. Because a wire-level test is the only artefact that survives the question "yes, but does it actually do that?"

A unit test written by the same model that wrote the code inherits the model's blind spots. It asserts the happy path against the happy path. It's a mirror, not a check. A test that opens a real socket, sends real bytes — including bytes the spec forbids — and asserts on what comes back is a different kind of object. It doesn't care what the code intended. It reports what the system did.

That's the shift in one line: generate the code, but verify the behavior — and verify it at the boundary, where behavior is the only thing that exists.

What we built for this

VirtuProbe is a request workbench for exactly this job. Every protocol it speaks — HTTP, SMTP, IMAP, LDAP, DNS, SMB, Kerberos, SpamAssassin — is a hand-rolled stack written against the RFC, not a wrapper around a library. That matters for one specific reason: the libraries "help." They normalise your headers, reject your invalid method name, quietly fix the malformed packet before it reaches the wire. That help is precisely what you don't want when the malformed packet is the test.

So you can send what the spec forbids. You can mark any field with §payload§ and fuzz it. You can chain an HTTP call to an IMAP fetch to an LDAP bind in a single runnable file and assert across all three. And yes — the built-in AI assistant will happily draft that chain for you. It suggests; the real protocol stack decides what's true. It builds the test; you run it on the wire, and the wire tells you whether the thing you shipped actually holds.

The model made writing the integration cheap. We make proving it correct possible. Those are two different jobs, and the second one just became the one that matters.

Verify at the boundary. VirtuProbe Studio is free to download — no account, no cloud, no telemetry. The free workbench includes the AI assistant (bring your own key), OAuth2, GraphQL, and chaining across HTTP, DNS and SMTP.

Download VirtuProbe Read: how it started →

Jakub Stonavský is the founder and maintainer of VirtuProbe Studio. He has spent 20+ years building systems from core-banking platforms to high-scale ad-tech — and coding around the gaps in every testing tool along the way. More about the project →

AI lowered the cost of code.It raised the price of bugs.

The cost curve flipped

Models are trained on the happy path

The only artefact that proves what shipped

What we built for this

AI lowered the cost of code.
It raised the price of bugs.