On a recent episode of Shop Talk, we discussed the concept of technical debt. In this post, I want to delve into the topic a bit more and give the topic the thought it deserves.
What Is Technical Debt?
As developers, we throw around the term “technical debt” a lot, but I think it makes sense to describe the concept and ensure that we’re on the same page. The brief version of technical debt is that we prioritize expedience over quality. That is, we choose to get something done quickly rather than getting it done correctly. But even in that description, I think there’s a little bit of question-begging involved that I’ll get to later on. For now, let’s take the definition at its word: we, as developers, take the expedient route rather than the best route. This leaves us with troublesome code. For us to consider a decision troublesome and the result as technical debt, it needs to meet several criteria:
- It is definitely worse than the alternative. After all, if I choose the expedient option and it turns out to be the best available option, then I can’t use that as an example of technical debt. This may seem obvious, but I think it strikes developers (especially fairly new developers) pretty often: “I don’t understand why this is complex, and therefore it must be wrong” is the default mindset for a lot of developers—including me! Complexity itself is not at issue; it’s complexity in the face of a simpler, better solution.
- It is less than total in scope. “We made a poor decision by using Java and need to re-write 15 million lines of code into Kotlin” is not technical debt; it’s a flight of fancy. The exception I can think of here is if a language is so utterly unsuited to the task and there are impending catastrophic consequences for the company. An example of this might be something like writing a real-time autopiloting system for an airplane in Ruby or an operating system in SQL. And honestly, if you’re that far in the hole, it’s not technical debt; it’s mind-bending failure.
- There is a path to get to the alternative solution. I might even go a little further an throw in the weasel word “reasonable” before path, just to make it clear, but in the extreme, there might be a significant undertaking to eliminate swaths of technical debt, depending on the size of the system and how much debt you’ve accumulated.
Why Do We Have Technical Debt?
This question seems to be the next relevant one, and there’s an obvious answer in the definition itself: expediency. But let’s be clear that this isn’t the only reason why we have technical debt. Here are several reasons why we might see technical debt in a system.
- Expediency in decision-making. Early on, instead of doing “the right thing,” we do the easy thing. Instead of being diligent and doing things the way we should, we get lazy and perform the minimal viable change.
- Time, either real or perceived. We may feel like we don’t have time to do things right and need to do things fast. Sometimes this will be accurate—if production is broken and we need to get something out there now, we may not do it the absolute best way, but instead try to get a quick fix in. Ideally, we go back and do things the right way, but until then, that’s technical debt. Sometimes, we may feel like we’re under a lot of pressure, but it’s entirely self-driven. This is especially true for medium- to high-performers in an organization, who want to get things done and feel like anything which slows them down is detrimental to the organization. Which leads nicely into the next element.
- Process. Processes can create technical debt as malignant side effects. For example, suppose we need to go through a formal change request process to add a new table to a database, but not to add a column to an existing table. What ends up happening is that people shoehorn in new columns even in places where we really should have a separate table. That way, they don’t have to fill out as much paperwork and get more stuff done. This is also the reason why people misuse existing columns like Address Line 3 so much: it’s already there, so we don’t need to go through all of the paperwork of trying to get a new column added.
- Changes in technology. We may pick something which worked perfectly well ten years ago, but there are better ways to do it today. In this case, something that previously was not technical debt can become technical debt. As an example of this, suppose we wanted to calculate a running total in T-SQL. Prior to SQL Server 2012, a cursor was the best solution to the problem (outside of the quirky update, which always depended on undefined SQL Server behavior). As of SQL Server 2012, a window function is the best answer to the problem. This means that the prior answer is no longer the best, and we have a case of technical debt we could clean up.
- Changes in philosophy or state of the art. Software development evolves over time, and as we come up with patterns and practices, we can apply them to cases where they make sense. This may include turning a half-baked pattern into a proper implementation. The person who developed the code may not have had sufficient knowledge of the pattern, which gives us a bit more technical debt.
There are plenty of other reasons as well, but we can see that it doesn’t just come down to “lazy developers shirking their duties.” But in fairness, sometimes it is…
Does Technical Debt Matter?
From the standpoint of a developer, the answer is that yes, technical debt matters a lot. Let’s make a list of reasons why developers think technical debt matters. Not every piece of technical debt checks all of these boxes, but these are the most typical pain points.
- It annoys us. You’re working through the code base and see that thing. We all have a “that thing” in our code base. It’s the dumb idea that somebody (and “somebody” may be prior versions of us) did and it just keeps rearing its ugly head. If the problem were in some little-used application from ten years ago that nobody bothers with and there’s a single happy customer still using it, we wouldn’t mind. But no, it’s never that easy; it’s always in a place where we constantly need to work around its tendrils.
- It makes development harder and slower. Ten-minute changes turn into two-hour changes. One-day tweaks turn into sprint-long slogs.
- It adds a mental burden. One of my favorite pieces of technical debt was the JDate (and I don’t mean the dating service!). This was a technical decision which never made sense but did make development a lot more difficult. Check out the link in here for a bit more of my experience with this mess of an idea, but when every date is replaced with an arbitrary value, this makes troubleshooting issues that much more difficult.
From the standpoint of non-developers, much of this doesn’t matter. The middle point, that development can slow down, is the one thing which affects the outside world. Otherwise, those are personal problems. This makes it a little more difficult to answer the next question.
How Do We Eliminate Technical Debt?
We’re now at a point where we have some notion of what technical debt, why we might have it, and the consequences of having it in our systems. Now how do we get rid of it?
Well, that answer is hard. Part of the reason why it’s hard is that there often needs to be business value in eliminating technical debt, and developers are rather bad at explaining business value. Product managers hear all kinds of feature requests, bug reports, and miscellaneous requirements from end users, regulatory bodies, and a host of other people and organizations. They have a big list of things to do and typically don’t want to spend time on items which provide no business value. Given what we just described above, this makes product managers less likely to prioritize cleaning up technical debt. Even if they’re sympathetic to the developers, product managers still need accomplishments and their incentives are aligned with satisfying customers, not developers.
The Best Approach
The best way to get technical debt cleaned up will involve an appeal to the product managers’ interests. Typically, this means explaining how much extra time it takes to solve specific tasks due to technical debt. For particularly big and obvious examples, it might add extra days to the amount of time it takes to accomplish a task, and that makes fighting to eliminate the debt a lot easier, as you can amortize the cost over a series of lower-cost feature deliveries.
But here’s the rub: there is a trust component to doing this. If the PM does, in fact, give you that time, you implement the changes, and your velocity does not increase as a result, they’re going to be much less likely to give the green light for future changes. In other words, if you’re going to make this argument, be sure you can back it up.
The Wildcat Approach
There is another method for cleaning up tech debt, and it is not something I recommend lightly. I call it the wildcat approach, named after wildcat banking. The idea is that you add the cost of technical debt cleanup to the cost of a story. For example, if a story might take one day to accomplish, you might budget two days and spend a day cleaning up some tech debt along the way. If the people around you are sympathetic to the idea, then you can get quite a bit of cleanup work done on the sly.
Importantly, this differs from the first approach in that the first approach is an official recognition of technical debt cleanup. In other words, your story would still take two days to complete but everybody—including the product manager—is aware that you are performing additional, relevant work to clean up a task. In this case, nobody else knows.
This approach comes with a lot of risk, especially if the changes you make are not popular. If you try to make a big change which breaks other systems, or if the cost spirals out beyond your expectations, you might lose control of the story and could end up in a really bad situation. In some organizations, that might be enough to get a developer fired. This is the type of thing you do in small increments when you really know what you’re doing, not to re-design an entire application. I’ll stop here before I
Technical Debt Is Debt, and Debt Can Be Fine
I think the term “technical debt” is a really good one, so much so that I want to focus on the “debt” part a little bit. Debt can be a terrible situation, where you owe so much money that you struggle to keep your ahead above water.
This aside, debt can be an important tool. For example, most people don’t have enough money in their savings accounts to buy a house outright, and so they use debt to finance the purchase of the house over a series of 7-30 years. In this sense, think of debt as taking on the value of an asset now, while paying for it (with some time preference premium) over a stretch of the future.
You can certainly end up in a disastrous situation as a result of taking on a mortgage you can’t afford (or a car loan, or student loans, or whatever), so it’s important to note that what you took on the debt for is independent of your ability to service that debt. In the case of technical debt, if we knowingly take on some debt to beat a competitor to market, that can be a great reason. In other words, the existence of technical debt is not itself a marker of failure; instead, understand why you have the debt.
Continuing the metaphor, we can think of the interest rate we have to pay as the continued cost of upkeep of this technical debt. If I have a 30-year home mortgage at 2.5%, the associated interest rate is low enough that I can easily manage having six-digits worth of debt. But for a credit card with a 20% APR, I’m going to be awash in debt payments and have great difficulty getting out of it. Similarly, if your technical debt adds a minor amount of annoyance or a few minutes worth of development time each month, that’s easily manageable. If it’s costing developers days per sprint, you’re losing a lot of productivity as a result. These are the tech debt items which are most important to hit, and these are the ones which you should be able to find easiest to gain PM support.
Not Everything is Technical Debt
It’s easy to call something technical debt, but over the course of this post, I’ve tried to make clear what technical debt is. Here, I’m going to give a couple examples of things which are not, in themselves, technical debt.
Necessary Complexity Is Not Technical Debt
As developers, we want things to be easy. We want solutions which “just work,” and with a few lines of code, we get exactly what we want with no negative side effects. In the real world, that’s not always going to be the case. There can be difficult requirements and complex scenarios which necessitate complex code. That’s because the alternative is even more painful. As an example of this, let me refer back to a post I wrote a few years ago on the leaky bucket algorithm. This is fairly complex code, with the main procedure totaling approximately 150 lines of code and including a cursor. That said, I do not consider it technical debt. Why not?
- It solves an important problem. I want to have an alerting system with a natural reset capability, meaning that I don’t get blasted with thousands of alerts when a thing goes down. Simply alerting when we hit a certain number of errors over a time frame may overwhelm a pager system.
- It handles important nuance. I also do not want to receive alerts for things which we’ve fixed or which were transitory. The ability for this process to “drain” messages over time means that if an error is transitory, I probably only get alerted once and that’s it. By contrast, if I do receive more than one alert, it’s a good sign that I have a deeper problem.
- As far as I am aware, there are no practical solutions which are better. It’d probably be simpler to write this in .NET, and so maybe I could do it as a CLR method, but whether that’s practical is a separate question. I could write this in application code somewhere, but then I have to add a new application to handle this work.
Technical Debt versus “Code I Don’t Like”
This is one of the real sticking points for developers. I see something, think to myself “That’s not the way I would do it,” and immediately think it’s therefore the wrong way. If I describe a reaction that way, it’s obvious that it’s a personal opinion and not really technical debt. The problem that we find, however, is that our sense for whether something is actually a problem leads to the same conclusions: “code smells” are difficult to distinguish from “code opinions” in our minds. Unless you logically analyze the code—and have the experience and skill necessary to diagnose issues in this code—it can be really difficult to tell the two apart.
I don’t have a wonderful answer here, but spitballing a few ideas for how we separate the two, I have:
- Can you elucidate the actual harm caused by this code? In other words, set aside distaste and focus on the actual pain. If that method is 500 lines long, what are the actual consequences?
- Is there a reason for the code to look the way that it does? Maybe you don’t like the formatting, but that code fits the development standards for the company. In that case, it’s more of a personal problem. But if the formatting can easily lead to a misunderstanding, we have something different. For example, putting one column per row in a SELECT statement reduces the risk of somebody missing a comma and instead of getting
FirstName, LastName, ..., we get
FirstName LastName, ..., which means we get the
FirstNamecolumn, alias it as
LastName, and don’t actually retrieve the last name.
- Is there an appreciable amount of harm? The first question asked for you to explain why something feels off. Now, we want to see what the actual downside is.
- How much experience do you have working with the code in question? If the answer is “not much,” then give the benefit of the doubt.
In CLosing, Chesterton’s Fence
Surprisingly, I’ve never actually talked about Chesterton’s Fence on the blog, though I did cover it in a Shop Talk episode. The really brief version of the idea is that two men are traveling along a country road and encounters a fence blocking the road. Surprisingly, the fence is barely wider than the road itself, meaning that a person could easily walk around the fence and continue on. The first man says to the other, “I don’t understand why this fence is here and it seems wholly unnecessary. I wish to tear it down.” The second man cautions the first, stating that the reason he absolutely should not tear it down is precisely because he does not understand its purpose.
Tying this to code, we really should understand why code is there before getting rid of it. The reason is that this bit of inscrutable code might actually be preventing a bigger problem from occurring. For example, on one consulting engagement, I noticed a bunch of SQL queries with MAXDOP(1). My figuring was, these queries definitely are faster when they’re run in parallel, so changing this to MAXDOP(2) or MAXDOP(4) would be a good idea, at least until we could re-write the queries to be more efficient. That way, they’d finish faster and cause less blocking. Well, as soon as we made the change, it led to an enormous spate of blocking because we started hitting
SOS_SCHEDULER_YIELD waits—that is, even though an individual query was faster when it ran, each query spent so much more time waiting for 4 cores to be available that it ground the whole system to a halt. Fortunately, we were watching the system as that change was put in place, so we reverted it immediately. But it serves as a good reminder that even if something “shouldn’t” be there, it’s important for us to understand why it’s there before we think of replacing it.
And so the same goes with technical debt. I’ve written a lot over the years on how to clean up code, eliminate technical debt, and make a code base more understandable. Those are all laudable goals for any code base, I think, and I don’t want to diminish them in this discussion. But it’s important to temper those desires with a solid understanding of why the code is as it is before you aim to make any big changes.