Jonathan Itakpe
HomePhotosBlogWorkAbout
Back to Blog

Cheap Code, Costly Reviews: The Hidden Bottleneck of AI

October 25, 2025
7 min read
Cheap Code, Costly Reviews: The Hidden Bottleneck of AI
ai in software developmentcode reviewsgithub copilotllm codingai productivitytechnical debtengineering bottleneckssoftware engineering best practices

If you're a tech lead or senior developer, you've probably noticed something strange lately: your pull request queue is overflowing. It's not your imagination. This isn't because the team suddenly got larger; it's because every developer is now equipped with an AI-powered force multiplier. We've entered an era where thousands of lines of code can be generated in minutes.

However, this surge in output has created a critical new bottleneck that is silently taxing our most valuable resources. While the cost of writing code has plummeted, the cost of verifying it has exploded. This new, AI-generated code is voluminous and often looks correct, turning what was a routine review into a high-stakes audit.

Every line of "easy" AI-generated code must be scrutinized for subtle flaws, hidden risks, and a complete lack of business context, placing an unsustainable burden on the review process and the senior engineers who are our last line of defense.

This article is not a protest against AI. Quite the opposite. The productivity gains from LLMs are undeniable and transformative, and they represent a permanent evolution in software development. Our goal isn't to stop using AI; it's to shift our processes to match this new velocity. We have successfully automated the writing of the first draft, and now we must invest just as heavily in automating and improving verification. The bottleneck has moved, and we need to move with it.

Writing Code is Cheap

Over 25% of all new code at Google is now generated by AI - Sundar Pichai, CEO Google - 2024
source

From the Stack Overflow 2025 Developer Survey - "51% of all professional developers report using AI tools daily in their workflow". This is not an occasional tool; this level of integration suggests that AI-assisted code generation is now the default velocity for the current generation of software development.

While exact percentages are debatable, multiple reports suggest Copilot contributes roughly 40-60% of code across various languages.

The Review Bottleneck

The problem isn't just that developers are writing more code; the core issue is that the code they are writing is fundamentally harder to review. While the speed of AI generation has effectively pushed the cost of writing code down to near-zero, it has silently transferred all of that cost—and more—to the verification stage.

This shift has turned the pull request queue into the primary performance bottleneck for the entire engineering organization and creates what I call the 'loop of doom' - a self-reinforcing cycle that compounds over time:


Loop of Doom: More AI code → Backlog → Fatigue → Debt → Slower teams


Lack of Context

LLMs don't understand the project's specific business goals, long-term architectural vision, or the nuances of a legacy system. It generates a functionally correct solution in a vacuum. The code reviewer must now spend extra time ensuring this new, "clean" code doesn't break a complex, unwritten business rule or clash with the existing architecture.

Working on codebases with millions of lines of code accumulated over years, an AI agent making a fairly innocuous change with little context can result in tunnel vision changes but with massive side effects. Having an AI-first approach to generating changes robs new team members of the process of learning how the existing code works or why certain decisions were made. The process of clicking through multiple files to make a small change offers valuable learning experiences that AI-first approaches bypass entirely.

Skipping this step means the reviewers and senior engineers on the team are the only ones left with understanding the business context - The fewer people who truly understand the system, the slower and riskier every review becomes.

Fatigue and "LGTM-Syndrome"

Reviewing human code is often about mentorship and intent ("I see what you were trying to do..."). Reviewing AI code is a different, more draining cognitive task. It's not mentorship; it's pure, cold auditing. You have to switch from a mindset of "Is this logical?", "Why was this decision made?" to "Is this correct in every conceivable way?", "This looks perfect, but is it doing what is intended?". This is far more taxing.

A new, critical problem is that the author of the code (the developer who prompted the AI) may not fully understand the code they are submitting. In a traditional review, you can ask the author, "Why did you choose this approach?" They can explain their logic. With AI-generated code, the answer is often, "I'm not sure, the AI produced it, but it passed the tests." This makes the reviewer the sole person responsible for understanding the code, dramatically increasing their burden.

The more reviews stack up, the harder it is to review properly and make sure things are done correctly. In essence, you are not able to dedicate enough time to read through the code and ask the right questions and it becomes really easy to fall into the trap of rubber-stamping merge requests with a simple "Looks Good To Me" comment without properly verifying the implications of changes and making sure the author is aware of the potential impact of the code written.

Technical Debt

Because the AI lacks the full architectural context, it might generate a solution that works now but is a long-term maintenance nightmare. It might use an overly complex design pattern or an obscure method when a simpler one already exists, implement unnecessary abstraction, or add new dependencies without weighing the trade-offs.

The code is written in seconds but adds hours to future feature development. If we do get to a point where we are unable to properly vet the code going into production systems due to the sheer volume, we risk building overly complicated and hard to manage software.

This isn't about achieving perfection or zero technical debt - some debt is inevitable. But it should result from conscious decisions, not review fatigue.

Sidebar: Pragmatic Playbook

I have been thinking about how to alleviate some of these problems and most of them can be condensed down to stronger and stricter engineering cultures - So same things you should be doing anyway even if you are not using any AI generated code. I have seen a lot of talk around AI-assisted reviewer, while I think it will help catch smaller bugs or security flaws, a reviewer-bot is still a bot and will still run into a lot of the issues discussed above.

  • Reduce Scope

    • Hard limit ~400 LOC changed per PR.
    • Over cap → split or attach a one-page design note.
  • AI provenance in PRs

    • PR template must state: model/version used, prompt/chat summary.
    • Author explains What changed / Why decisions were made / Detailed Trade-offs.
  • Two-tier reviews

    • Triage (junior/peer): style, tests, obvious smells.
    • Audit (senior/owner): architecture, invariants, cross-service impact.
    • Set a max PRs/day per senior to avoid rubber-stamping.
  • Blockers in CI

    • Static analysis + SAST as blocking checks.

Conclusion

AI didn't remove engineering judgment; it made it the bottleneck. The industry seems to still be figuring this out and it will take a couple of cycles before we figure out the best middle ground. A great starting point is really just having strong engineering cultures - Aggressive testing, smaller PRs and encouraging more "AI-assisted" development over "AI-generated" development.

These observations come from talking with multiple teams navigating this transition. If you're experiencing similar challenges or have found effective solutions, I'd love to hear from you.

Share this post

TwitterLinkedIn

Previous

Welcome!

© 2025 Jonathan Itakpe. All rights reserved.