When LLMs suck on your codebase, is it a valuable signal?
I have to admit that I am not the most advanced user of LLMs. I see them as tools, I resist the hype, I see their dangers and horrific things, I do understand the benefits and advances that they bring in some specific applications. I am not interested in getting a piece out with my opinions, but I did want to write something about code quality, code complexity and their relation with LLMs that I haven’t seen written.
LLMs on complex code bases
I often have to contribute to code bases that are, quite frankly, a mess. We all had to deal with one of them at some point for our career, but it’s never a fun experience. Over time, we learn to know the confusing parts, we develop an irreplaceable skill on how to deal with the ugly details and be somewhat productive no matter the intricacies. This is the reality of a lot of software engineers’ day to day in the real world.
Obviously the tech industry comes and proposes LLMs as a solution to all problems, but when tried on those complex problems, the results are mixed at best. The LLMs hallucinate, the prompts required to make them work decently become absurdly complex and long (to the point that it takes less time to write the code), the context window is never enough, we see agents get stuck in a loop. What a mess! But is it really the LLM’s problem?
I think most people that have tried LLMs on different code bases have experienced that they work fine in some code bases and less in others. The more structured the code base, the simpler, the smallest, the best it works. It’s obviously a context problem, but not only. Those things aren’t intelligent beings, they are the best pattern matching machines we have ever built. And we do the same: when working on a new code base, we look for patterns, we try to mimic them, we make changes that fit in the current state of the program. But when things are a mess, it’s hard.
The hidden signal in bad LLM experiences
My experience is that when LLMs struggle in every case, it’s a sign that a code base is too complex and ripe for a human driven refactor. If there is more productivity to be gained from those tools (and in some cases there is), the tools need to be put in a situation to operate. We’re not doing it because we give them a personality or because they are intelligent, we do it because they are tools and all tools are to be used only in the context in which they work.
The signal is clear to me, albeit not scientifically measurable: if you can’t get anything done at all with an LLM on a codebase today, your codebase is too complex and the very same codebase will be a productivity killer for your engineers. Grab a few humans, give them time, do the investment in the quality and maintainability of your software. Code is here to stay and you will unlock better productivity for your team and better automation via LLMs if you want to use them.