The Unimportance of Tests 2 - 2025 Update

Way back in 2015 I wrote this post and it remains my most talked about piece of content. It has literally come up in every interview I have done since then, and most of the time it is misunderstood. So, let’s talk about it.

What Most People Got Wrong

I don’t hate tests. In fact, I don’t even particularly dislike writing tests. Are they my favorite part of writing software? No, but I’d say they rank somewhere below debugging, and above fiddling with regex.

Again, tests prove to people that your code does what it says it does. Including yourself 10 minutes later. Tests prove the intent of the software.

But You Said “Tests Don’t Belong Everywhere”

Right, as in “Stop trying to hit 100% coverage”. Really, once you hit a solid 75 - 80% coverage, you should be pretty confident in your code already. Past that point you are really just investing a lot of effort in showing off.

You’ve Just Never Worked Anywhere That Did Coverage Right

Yes. I have never worked anywhere that did coverage right. In 25 years, over many companies, at many different scales, in many industries, in various roles as an IC or in leadership, in different tech stacks, I have never worked anywhere that has “done coverage right”.

Quick question - I also have large feet, and notice those giant red shoes you’re wearing look fresh as hell, where’d you score those?

What Most People Got Right

A lot of people thought that article was an indictment of test culture, or TDD itself, and in a lot of respects, they’re right. There’s probably some value for some people who maybe use tests as a way to describe the software they are building, and can effectively communicate the software they are building as a series of assertions.

I have never found that to be an effective way to describe software.

Aren’t the Intent and the Software the Same Thing?

Well, yes and no. In theory, software is theory-crafting, so all software should be the same as the intent of that software. In reality, all software also contains practical tricks to account for the messy nature of the real world.

No one sets out one morning to create a debounce, or a short-circuit retry mechanism as the very first part of their system. In fact, those parts are typically afterthoughts, bolted-on later at best, completely missed most of the time. But it’s little details like that which are actually pretty critical to making most of the world function, whether you know it or not.

These little practical tricks exist purely to helps us deal with the edge cases, which brings me to the next thing that most people got right -

Coverage Isn’t the Only Metric That Matters

Most orgs, in fact almost all orgs, tracks changes to test coverage in a codebase over time as a measure of code quality. As long as coverage remains high, code quality is high, because all the code must be well tested, right?

I refer you once again to our dear friend Charles Goodhart. Coverage doesn’t mean what we think it does, or rather, it doesn’t correlate directly to quality. Coverage literally asserts how much of the the codebase as a percentage of the codebase is used when the tests are run.

If you’d like to achieve perfect coverage, write a single test that calls everything in your codebase, and asserts 1 == 1. Wrap the whole thing with whatever method of exception squelching you’d like. Instant 100% coverage, 0% value.

Tests Cannot Make Up for a Lack of Communication

In some respects, tests can be considered a form of living documentation, possibly the most up to date documentation anyone has. It’s forced to change with the code base as it evolves, and it’s has to remain correct for the tests to pass. Sure, the test names and descriptions can be wrong, but the assertions have to be right, otherwise the whole thing stops. So, from that perspective, maybe it makes sense to try and use tests and 100% coverage to document everything engineering does, right?

You know what isn’t documented in tests?

Why any of what’s happening there matters.

Why do these objects matter, what do they mean to the domain, to the system, to the org, to the business? Ideally that information is written down somewhere, but I can almost guarantee it’s not written down in the tests. Outside of an extreme example where the organization is just one person, or made entirely of engineers writing testing software for other software engineers to test their software with, all organizations fall under the gaze of Melvin Conway, and it shows a lot of the time.

So, Why the 2025 Update?

Glad you asked, let’s break down what’s happening today -

Vibe Coding. We are employing extremely good communicators as rudimentary programmers in our codebases now.
This is simultaneously increasing documentation, making code worse, removing testability, and driving up confidence in that very same code (most of the time).
We are somehow fine with this the majority of the time.

Vibe Coding

Now, politically or ideologically, how you think or feel about AI is irrelevant - vibe coding is happening. People are talking to their IDE, and getting it to write code for them. That’s a reality. Accept that reality.

Good Communicators

It’s not reasoning, it’s not problem solving, it’s just recognizing a pattern and finishing a sentence, much in the same way that auto-complete does. Trust me, I’ve been at this a while. It is, for all intents and purposes, just pretty good at hearing English and responding in some mix of Python, JavaScript, HTML and CSS.

Increased Documentation

Because the entirety of the input is happening with natural language, every step of the input process is documented. In a lot of ways, it’s actually the perfect documentation system - a lossless, living document of every decision made, and every change requested along the way, the thought process behind everything. At least, from one side.

Worse Code

Because now the side that used to do a lot of thinking, diagraming, and writing RFCs is now skipping all of that and going straight to regurgitating from getting started guides and stack overflow posts. None of the reasoning is well explained, no theory crafting, mainly just “this is how setting up a webserver in NodeJS IS”.

Reduced Testability

Meanwhile, LLMs cannot really reason about their own code, having written it, so they are unlikely to be able to write tests for it after the fact. Some very expensive, very specialized models can get it done, which is why many people still defer to human testers. Most vibe coders are not testers, nor do they have specialized testing LLMs, so they use whatever tests their LLM of choice gives them, and they move on.

Increased Confidence

And all the while, people are trusting LLMs more and more to take on more complex coding tasks. Sometimes with really bad results. The general consensus is that “machines are better at coding than humans” and on average, that isn’t wrong. I say on average, because, over the entirety of the population of all humans, to ever exist in perpetuity, yes. LLMs are currently better than all the humans at code.

That’s an incredibly low bar to hit. Even being better than the average of all currently alive humans is too low a bar. Try this instead - can an LLM be a better coder than the average human coder? Interestingly, the answer is no. An LLM has been trained with data from the internet, which is going to contain a mix of current, outdated, right, wrong, and incompatible answers. It would have to reconcile all that, but doesn’t actually know how to. Your average coder does. The only place an LLM has the advantage is that it most likely knows more programming languages, but that is almost never an advantage.

It Was Never About Testing

This trend of moving away from well-tested code speaks to something I was pointing out - it was never about tests. It was never about coverage. It has only ever been about communication and confidence. If you can effectively communicate the intent of your software, and you’re confident that your software is executing your intent, you are unlikely to question that.

You probably should, but you’re unlikely too.

Tests are a confidence tool. They give us confidence that our work is up to our standards, or at least a standard. But as we move away from writing code ourselves, and testing the code we wrote, and instead using tools to have code written for us, the confidence moves to being in the tools.

And it was always about confidence.

Confidence Tooling

If we look back at the last two decades of software engineering, it becomes pretty clear that it has always been about having the confidence to move forward.

Virtualization to ensure we could replicate environments with confidence.
Build tooling to ensure we were confident about what we were actually shipping.
Performance / Load Testing to make us confident services don’t simply fall over when exposed to the internet.
Containerization, because we weren’t confident we could run it anywhere but our machine, so we just shipped our whole damn machine.
Industry behemoths like Grafana, Splunk, Datadog, PagerDuty, and New Relic who exist purely because you need to be confident that when things go sideways, you’re ready to deal with it.

Tests have been and continue to be a step in that confidence tooling chain. Specifically, they are still the step that gives us confidence in our software’s intent. Well written tests also make sure that intent matches the function.

Extremely well written tests are living documents that line up to your extremely well written documentation, which lines up to your living documentation of requirements and business decisions. A perfect, cohesive, amalgamation of knowledge, confidence, intent, decisions, and a clear path to results; a resource you could point anyone in your organization at, at any time, to answer any question.

Now if only there existed a tool you could feed a bunch of knowledge in various formats, that could later synthesize said knowledge into answers to various questions using pattern matching engines and chaining systems. Wouldn’t that be something …

# The Unimportance of Tests 2 - 2025 Update