Are Code Reviews Still for Humans?
We spent decades optimizing code for human readers. What happens when the primary reader becomes an AI?
Jason Valle
ChatML Team
Last night over beers, I found myself arguing with a coworker about whether agent-generated code still needs a human reviewer.
My position felt obvious: of course it does.
But halfway through the argument, I realized something uncomfortable — many of the things I've cared about for years in code reviews might not matter anymore.
My argument was simple– yes agentic engineering has revolutionized the way we develop software and can be leveraged in ways considered impossible a mere 6 months ago by most, but we aren't 100% there yet and at the very least we should require a human code reviewer, if the engineer themselves isn't going to review the code before opening a pull request. While this might still be obvious, it might not be for the reasons that I was so sure of at the start of our conversation.
If you've read any of the other recent ChatML blog posts, it should be clear that my coworker is about as vibe-first, cowboy-coding as you can get. Give your agent a task (or many!), accept all edits this session, merge unseen and iterate until you're acquired. I'll be the first to admit that we have had success with this workflow and once you get the ball rolling it leads to some truly eye-popping productivity statistics but deep down I still haven't been able to stop wondering, IS THE CODE GONNA BE OKAY??
I started to realize throughout the discussion, drawing from my experience over the past 5 or so months of reviewing agent generated code, that as I reflected on my argument that I was "always able to find at least a few" points of feedback on un–reviewed pull requests, perhaps many of the things that I have cared about for so many years reviewing code no longer applied.
The code review habits that might not matter anymore
So much of the code review process emphasizes maintainability of code for humans. My go-to bag of tricks to show that I actually read all of the code in the diff suddenly felt empty and pedantic:
- "This file has grown to be quite large, maybe we can break it up into smaller components"
- "It isn't super easy to tell what's going on with these nested ternaries"
- "Prefer verbose, descriptive variable names"
Historically valid feedback that all boils down to making the codebase easier to understand for those humans that come after you.
We spent decades optimizing code for humans. What happens when the primary reader becomes an AI?
It's one of many interesting questions that has come out of this agentic workflow revolution but the first one that has stopped me in my tracks in the middle of an argument that I was sure I was right about. The agents that come across this codebase next are going to be able to figure out what a file does before you can even dismiss the "Update VSCode" prompt. Any optimizations for human readability have long been rendered negligible in this new era of software development.
I'm not ready to fully concede my position just yet. I've definitely caught functional bugs and issues in code review, and on a number of occasions have found it helpful to go so far as to pull down the branch and test the changes because something just felt fishy.
Maybe the question isn't whether code needs human reviewers anymore. Maybe the real question is: what does "good code" even mean when the primary reader isn't human?
More from the blog
Want to try ChatML?
Download ChatML