The $150 Bug: When AI Agents Save Your Work (and Almost Destroy It)
How a coding agent caught a catastrophic data loss bug in a major library release
Software development is entering a strange era of delegated labour. Simon Willison recently sat down to finish a major release for sqlite-utils, a tool used by thousands. Instead of writing the final commits himself, he hired Claude Fable—an AI agent—for the equivalent of about $150. The goal was a final review to catch breaking changes before the 4.0 stable release. What followed was not just a simple proofread, but a deep, structural interrogation of the code that a human eye, perhaps fatigued by the repetitive nature of regression testing, might have missed.
The Poisoned Connection
The most alarming discovery was a bug in the `delete_where()` method. On the surface, it looked like a standard deletion command. In reality, it was a silent killer. Because the method lacked an atomic wrapper, it failed to commit the transaction, leaving the database connection in a state of permanent, uncommitted flux. If a developer had shipped this, every subsequent write operation would have been lost when the connection closed. It wasn't just a minor error; it was a design flaw that would have caused massive, silent data loss for anyone relying on the library's new transaction model.
The connection is left in_transaction=True, so every subsequent atomic() call takes the savepoint branch and never commits either. That’s a really bad bug!
This incident highlights the specific utility of AI in the current development cycle. The agent didn't just suggest stylistic changes; it performed end-to-end reproduction of failures. It identified that the library's new promise—that every write operation is automatically committed—was being broken by a single, poorly implemented function. For a developer, this changes the nature of the job. You are no longer just a writer of logic; you are a high-level auditor of machine-generated intent.
The Multi-Model Audit
Willison also touched on a practice that sounds superstitious but is increasingly practical: cross-model verification. He used Anthropic's best model to review the work, and then brought in OpenAI's GPT-5.5 to perform a final sanity check. This creates a system of checks and balances where the biases or hallucinations of one model are caught by the different training data and logic of another. It is a way of triangulating truth in a landscape of probabilistic outputs.
- Agents are best used for high-volume, high-precision regression testing.
- The human role is shifting from 'author' to 'editor-in-chief'.
- Cross-model verification reduces the risk of single-model hallucinations.
- Harder tasks actually provide more 'downtime' for the human to think deeply.
As we move toward more autonomous coding, the cost of entry for a single developer drops, but the cost of a mistake rises. When an agent can write 1,300 lines of code in a few minutes, the human must be capable of reading and understanding those lines just as quickly. The speed of production is no longer the bottleneck; the speed of comprehension is.
The future of engineering is not about writing code, but about the rigorous, multi-model auditing of code written by others.