We’re getting close to the home stretch in my Working Effectively with Legacy SQL series. Today’s post is all about obsolete SQL. Nothing kills a project more than oodles of unused code filling up the code base. When I started working with my current team, I was put in charge of approximately 800 stored procedures. Within two months, that number dropped to 650 simply because I removed obsolete stored procedures. I also eliminated approximately 25 functions and 30 views. Today’s code base has 19 functions (of which at least three are on my hit list) and 20 views, so we’re talking about a 50% decrease in number of objects with no loss in functionality.
This is one form of finding and removing obsolete SQL. Those 205 objects I deprecated means 205 objects I do not need to maintain and my successor(s) will never need to learn. We’ll talk about how to find obsolete objects in just a little bit. First, let’s talk about the other type of obsolete code: obsolete code within objects.
There are a few ways in which obsolete code manifests itself within an object. For the purposes of demonstration, let’s focus on stored procedures here, even though we could find the same within table-valued functions, views, and even tables. To enumerate various ways in which we can find obsolete SQL within a single stored procedure, I’m going to use the following example:
CREATE PROCEDURE dbo.TestProcedure @SomeValue INT = 25, --Code always passes in 25 @Person VARCHAR(30) = 'Bob' AS IF (@SomeValue <> 25) BEGIN THROW 50000, 'Value must be 25!', 1; END IF (@SomeValue = 20) BEGIN SELECT 'OK' AS Result; END IF (@SomeValue = 21) BEGIN SELECT 'OK' AS Result; END SELECT @Person AS Person, --'Something' AS Result @SomeValue AS Value; --IF (@Person = 'Melinda') --BEGIN -- DECLARE -- @var1 INT, -- @var2 INT, -- @var3 INT; ---- This code needs to run every time!!! -- INSERT INTO dbo.SomeTable -- ( -- val -- ) -- VALUES -- ( -- @SomeValue -- ); -- SELECT -- @var1 = v1, -- @var2 = v2, -- @var3 = v3 -- FROM Someplace -- WHERE -- Something = 29; --END IF (1 = 0) BEGIN THROW 50000, 'Boolean logic has been invalidated!', 1; END SELECT 1 FROM Something WHERE 1 = 0;
In this simple example, we have four separate sections of obsolete code. The first section starts on line 11 and goes through line 19. @SomeValue will never be 20 or 21 because we already check on line 6 if @SomeValue is something other than 25; if it is, we throw an exception and leave the procedure. You generally see this logical impossibility scenario in older procedures, when the first developer had special cases for values 20 and 21. Eventually, a requirement change made it so that the value must be equal to 25. Instead of cleaning up the procedure, the developer making this change (often a different developer than the one who initially wrote the procedure) is “being careful” and simply adds a case on top. This leads to unnecessary checks and a minor performance degradation, but more importantly, makes the code harder to read. When refactoring a procedure, be on the lookout for logical impossibilities and remove them.
The next bit of obsolete code is on line 23, where the Result attribute is commented out. I see this in a lot of code, where developers are afraid to remove a column because it might be useful in the future. The big problem here is, if there’s a bug in the procedure, I don’t know if your commented-out line is the bug or is orthogonal to the actual bug. For all I know, you were doing some testing and accidentally forgot to uncomment that line, and it takes me extra time to try to unravel the problem. Delete commented-out lines.
Lines 26-50 are an extended example of the previous problem. In this case, we have an entire segment of code commented out. For bonus effect, I added a comment on line 33 saying that this code section needs to run…and yet it won’t. So which is true, the comment or the commented-out code? Again, if you’re dealing with a bug in this procedure, commented-out code leads to more confusion than necessary.
Lines 52-61 are special cases of the logical impossibility. The first scenario has an impossible IF condition, and the second scenario has an impossible condition in the WHERE clause. Most of the time, you’ll see these when requirements change. Maybe “IF (1 = 0)” used to read “IF (@SomeVal > 23)” but that requirement changed over time. Again, this hurts a lot more than it helps; delete it.
So here’s the final version of the query, removing all obsolete code:
CREATE PROCEDURE dbo.TestProcedure @SomeValue INT = 25, --Code always passes in 25 @Person VARCHAR(30) = 'Bob' AS IF (@SomeValue <> 25) BEGIN THROW 50000, 'Value must be 25!', 1; END SELECT @Person AS Person, @SomeValue AS Value;
This version of the procedure is a lot more maintainable than the prior version. Yes, this is a toy example, but when you’re dealing with a 4000-line monstrosity, eliminating a few hundred lines of obsolete SQL still makes a pretty big difference.
Given that, why do people hoard code? It all comes down to risk aversion and fear of (or lack of) source control. Hypothetically, that commented-out code could become useful again someday, so if you need it again, it’s better to keep it around. Well, hypothetically, that newspaper from 1978 could come in handy again someday, but the likelihood of this being true is slim; “could come in handy” isn’t a good enough reason to hoard every copy of the newspaper going back 40 years, and it isn’t a good enough reason to keep commented code around.
But here’s the better point: even if you know you need it again someday, you still don’t need to comment out your code. Use source control! I understand that there are plenty of organizations which don’t use source control for their SQL code—I worked at one where the DBA team pushed for this but the web dev team (which wrote a lot of these procedures) didn’t want to use source control because reasons. These organizations are doing it wrong. There are zero reasons why you should not use source control for database code. If you believe there is a reason, either you are wrong or you are doing it wrong. You don’t need to use Git or Mercurial (although they’re nice); even if you’re using SVN or TFVC, that does the job. Every single source control tool has the ability to look back in history, so if you ever need that code again, you can get it. If your company doesn’t use source control, you can always get Git or Mercurial and use that locally—that’s one benefit to a distributed version control system.
So, moving on from obsolete code within a stored procedure, let’s look at finding obsolete stored procedures. This is a little bit harder of a problem, but we have some tools available to us to help out. Within SQL Server, we can look at the plan and procedure caches as indicators of a procedure being in use. These aren’t perfect, but if a procedure shows up on the list, at least we know it has been used recently enough to be in the procedure or plan cache.
Aside from that, check out source control. We use a tool called OpenGrok to search. They provide a URL-based interface for searching within a code base, and you can look for specific stored procedure calls in your data layer. Alternatively, you could try using a grep-like tool (e.g., egrep or Powershell’s Select-String) to search within a directory of code files for that procedure call. The same goes for Integration Services packages and Reporting Services reports—those are XML files and you can search inside them for procedure calls.
If you’re looking at scheduled tasks, you can check SQL Agent jobs pretty easily: msdb.dbo.sysjobs and msdb.dbo.sysjobsteps will contain information, so a quick search against those tables would return your answer. If you have a third-party scheduler, it probably has an interface for searches.
Naturally, objects can call other objects. In SQL Server, sys.sql_modules has a definition attribute which contains the definition of an object. You can search through those definitions to see if your stored procedure call exists. You may also have metadata tables which contain stored procedures to call, so querying those will help.
I’m being purposefully vague on this topic because I’m working on a tool to help people search their code bases for potentially obsolete code. The problem is that you can never absolutely prove that a procedure is not in use, but given enough checks, you can have a higher level of confidence and can get rid of a lot of unused procedures. In my above example, we dropped over 150 procedures and I needed to return one procedure which people occasionally run (manually) to fix data, but hadn’t been run in a few months. Don’t be afraid to research and drop unused procedures; it makes your life easier, and if you ever make a mistake, pull that procedure back out from source control.