Let’s hit the final topic of my Working Effectively with Legacy SQL series. Today, we discuss the topic of modularization. In object-oriented programming, modularization is a great idea. One of the “clean code” tenets is Don’t Repeat Yourself: if you have two pieces of code doing the same work, try to unify that code, isolating that particular functionality in its own part of the code base.
This sounds like a fantastic idea, but it turns out not to work quite as well in SQL Server. There are two reasons for this: first, T-SQL is not an object-oriented language; second, trying to modularize complex queries can confuse the query optimizer and give you terrible execution plans and performance. I wrote about this topic just about a year ago, so I’ll link to my explanations of why nested views and user-defined functions can be anti-patterns in T-SQL.
The gist of these two blog posts is that the most common forms of encapsulating data (views and user-defined functions) can come with a major performance loss, and so you should use them either sparingly or not at all. In the spirit of improving the lives of our maintenance developers, however, let’s think about some ways in which modularization can make sense.
The first way in which modularization can make sense is separating commonly-used, independent queries into their own procedures. Last time around, I talked about error and performance logging. I can move those INSERT statements into their own separate utility procedures and simply call them from my main procedures. The reason I would want to do this is that it makes future changes easier. Suppose that I need to insert a record into PerformanceLog today, but a year from now, I’m inserting it into a different table with a different column structure, or I’m firing off a Service Broker message, or maybe I’ve got dynamic logic to determine whether this record should even be inserted into the table. These are changes I don’t want to make in every single stored procedure, so it makes sense to wrap them up in a utility procedure.
Another example could be the SELECT statements you use to load certain tables. Suppose you have a batch insertion procedure which selects data from some table with certain filters and then inserts records into a separate table. Splitting that portion out into its own procedure can make sense on a couple of levels. First, it lets you troubleshoot the SELECT portion of the query independently of the INSERT portion. Second, if you also need to call the SELECT statement separately for reporting purposes, you can do so. Just remember that if you do decide to create a sub-procedure, you incur a little bit of overhead cost and make your overall design a little more complex, so save this for cases in which it really makes sense.
A third way in which you can think of implementing DRY within the SQL Server world is getting rid of redundant procedures. If you have one procedure which returns { A, B } and you have a separate procedure which returns { A, B, C }, re-work the code to call the latter procedure and drop the former. I can see exceptions when you have a fast { A, B } (which uses a covering, non-clustered index) versus another procedure which has { A, B, …, Y, Z }, but if the difference is minor and the relevant index covers both queries, stick with one procedure. Also, get rid of any true duplicates. You might not expect to see any duplicate procedures, but they can pop up: two procedures with different names but return exactly the same result set. Maybe they returned different results at one point in time, or maybe the person who wrote the second procedure didn’t know about the first; whatever, the reason, there is zero value in maintaining two versions of the same procedure. Sometimes you have to do some mental parsing to determine if two queries are exactly the same (and after you do that, reflect upon just how great the optimizer is, as it tries to do the same thing in a fraction of a second). If you do have duplicates, here is a good place to practice DRY. As a quick side note, this applies just the same to redundant indexes, so check those out as well.
Wrapping this section up, there are relatively few ways in which we as T-SQL developers can practice the DRY principle. In many cases, it makes more sense to de-modularize code, moving out of nested views and away from table-valued functions to standard queries against tables. This is because we want to give the query optimizer a fighting chance at coming up with an execution plan which performs well and, unlike object-oriented languages like C# and Java (where the marginal cost of an extra object is negligible), the performance penalty for encapsulating code in T-SQL is fairly high.