Simplify!

When working with legacy code, the first and safest step (after testing and automating tests) is to format your code to make it readable.  With some procedures—especially the basic CRUD procedures—that will be enough right there and you can pat yourself on the back for a job well done.  But let’s suppose that we have some more complex procedures.  Our next step will be to simplify code.  I’m going to steal a bit from the Zen of Python:

Simple is better than complex.
Complex is better than complicated.

I love this couplet because it is both true and meaningful.  Most of the code I deal with on a daily basis is complicated:  cursors and while loops generate and execute dynamic SQL; chains of stored procedures use temp tables to hold state; unrepeatable (and thus unteastable) processes kick off other processes which kick off other processes which set up other, independent processes.  Some of this code was intentionally designed, but a lot of it evolved as business requirements changed over time.  Some developer made a particular decision based on certain business requirements and conditions at one point in time.  A later developer made a decision based on different business requirements and conditions at a later time.  A third developer created some plumbing code to connect the first and second developers’ work together, and a fourth developer changed bits and pieces of it.  Add all of this up and you have a system which nobody understands in its entirety, especially if developers 1-4 are gone and you’re stuck maintaining their work.

But we have an ace in the hole here:  tests and re-formatting.  Building out tests gives us a chance to make implementation changes without breaking our contract:  at the beginning of this process, we accept inputs { X, Y, Z }.  At the end of the process, we generate outputs { A, B, C }.  The important part about these types of tests is that we do not make any assertions about the internal details, about how we get from { X, Y, Z } to { A, B, C }.  This gives us flexibility to re-design.  Similarly, re-formatting procedures helps give us a better understanding of what exactly each procedure in the chain is doing, why it exists, and how it works.  During this time, you probably will have built some unit and integration tests around the individual procedures in the chain—the implementation details—but that’s okay.  In the end, we might get rid of some of these tests, but they help us understand what the code is doing in the meantime.

If you have tests in place and well-formatted code, you can start working your way from complicated to complex to simple.  The best way to do this is design.  I spent some time thinking about how to do design in an agile world, so that series might be of interest at this point.  Regardless, your prior work has already helped you in the quest for simplification, as you should have pretty solid requirements and you can spend time thinking about how best to refactor your code.

Because simplification is a complex topic, I’m going to break this discussion out into two parts.  The first part of this discussion will be simplifying code within a procedure, and the second part will be simplifying an overall design which covers a number of procedures.

Looking at a single procedure, there are a few things we can do to simplify code.  Here are the things I will cover in this section:

  • Using the APPLY operator to simplify calculations
  • Using static SQL unless dynamic SQL is necessary
  • Turning cursors and WHILE loops into set-based statements
  • Simplifying filters
  • Simplifying overall procedure design

Simplify Calculations Using The APPLY Operator

One of my favorite uses of the APPLY operator is simplifying calculations.  Let’s walk through a simplistic but complicated business example and try to turn it into something simple.

Let’s say that we have a fact table with a few measures and we need to calculate a few other measures from our base results.  Our Fact table has the following measures:  ReportDate, Clicks, OrderCount, OrderRevenue, ProductCost, and Cost.  Over a specific ReportDate range, we want to calculate ConversionRate (OrderCount / Clicks), NetMargin (OrderRevenue – [ProductCost + Cost]), and ReturnOnAdSpend (OrderRevenue / Cost).

This sounds pretty simple, but I’m going to throw a wrench into the plans:  first of all, Cost and Clicks can sometimes be 0, meaning that our ConversionRate and ReturnOnAdSpend calculations need safeguards against dividing by 0.  Second, ConversionRate can never be above 100%, so we need a hard limit there.  Third, let’s say that our fact table can have NULL values—it’s a terrible thing, but not as uncommon as I’d like it to be.  Fourth, Cost and Clicks don’t really exist—we have to use ClickSource1 and ClickSource2 as well as CostSource1 and CostSource2 based on whether a bit flag UseSource1 is true or false.  Again, this is bad fact design, but it’s something we often need to deal with.

So let’s gin up some sample data and get cracking:

CREATE TABLE #Fact
(
	ReportDate INT NOT NULL,
	UseSource1 BIT NOT NULL,
	ClickSource1 INT NULL,
	ClickSource2 INT NULL,
	CostSource1 DECIMAL(23, 4) NULL,
	CostSource2 DECIMAL(23, 4) NULL,
	OrderCount INT NULL,
	OrderRevenue DECIMAL(23, 4) NULL,
	ProductCost DECIMAL(23, 4) NULL
);

INSERT INTO dbo.#Fact
(
	ReportDate,
	UseSource1,
	ClickSource1,
	ClickSource2,
	CostSource1,
	CostSource2,
	OrderCount,
	OrderRevenue,
	ProductCost
)
VALUES
(20150101, 1, 25, NULL, 285.86, NULL, 18, 1349.56, 187.39),
(20150102, 0, 25, 6, 285.86, 8.36, 3, 98.72, 75.14),
(20150103, 1, 16, NULL, 28.38, NULL, 1, 9.99, 5.42),
(20150104, 1, NULL, NULL, NULL, NULL, NULL, NULL, NULL),
(20150105, 0, NULL, NULL, NULL, NULL, NULL, NULL, NULL),
(20150106, 1, 108, NULL, 39.80, NULL, 12, 2475.02, 918.60),
(20150107, 0, NULL, 85, NULL, 85.00, 67, 428.77, 206.13);

Note that I have bad data of different sorts in this example, including irrelevant data (like in 20150102 for ClickSource1 and CostSource1) and missing data (20150104 and 20150105).

So now let’s get our query written.  Note that I want to roll up data across all report dates in the range 2015-01-01 through 2015-01-07, so I need aggregations here.  Here is the first go of this result set, which is well-formatted but complicated.

SELECT
	SUM(CASE WHEN f.UseSource1 = 1 THEN ISNULL(f.ClickSource1, 0) ELSE ISNULL(f.ClickSource2, 0) END) AS Clicks,
	SUM(ISNULL(f.OrderCount, 0)) AS OrderCount,
	SUM(ISNULL(f.OrderRevenue, 0)) AS OrderRevenue,
	SUM(ISNULL(f.ProductCost, 0)) AS ProductCost,
	SUM(CASE WHEN f.UseSource1 = 1 THEN ISNULL(f.CostSource1, 0) ELSE ISNULL(f.CostSource2, 0) END) AS Cost,
	CAST
	(
		CASE
			WHEN SUM(CASE WHEN f.UseSource1 = 1 THEN ISNULL(f.ClickSource1, 0) ELSE ISNULL(f.ClickSource2, 0) END) = 0 THEN 0.0
			WHEN SUM(ISNULL(f.OrderCount, 0)) = 0 THEN 0.0
			WHEN SUM(ISNULL(f.OrderCount, 0)) > SUM(CASE WHEN f.UseSource1 = 1 THEN ISNULL(f.ClickSource1, 0) ELSE ISNULL(f.ClickSource2, 0) END) THEN 1.0
			ELSE 1.0 * SUM(ISNULL(f.OrderCount, 0)) / SUM(CASE WHEN f.UseSource1 = 1 THEN ISNULL(f.ClickSource1, 0) ELSE ISNULL(f.ClickSource2, 0) END)
		END AS DECIMAL(19, 4)
	) AS ConversionRate,
	SUM(ISNULL(f.OrderRevenue, 0) - (ISNULL(f.ProductCost, 0) + CASE WHEN f.UseSource1 = 1 THEN ISNULL(f.CostSource1, 0) ELSE ISNULL(f.CostSource2, 0) END)) AS NetMargin,
	CASE
		WHEN SUM(CASE WHEN f.UseSource1 = 1 THEN ISNULL(f.CostSource1, 0) ELSE ISNULL(f.CostSource2, 0) END) = 0 THEN
			CASE
				WHEN SUM(ISNULL(f.OrderRevenue, 0)) = 0 THEN 0.0
				ELSE 1.0
			END
		ELSE SUM(ISNULL(f.OrderRevenue, 0)) / SUM(CASE WHEN f.UseSource1 = 1 THEN ISNULL(f.CostSource1, 0) ELSE ISNULL(f.CostSource2, 0) END)
	END AS ReturnOnAdSpend
FROM dbo.#Fact f
WHERE
	f.ReportDate BETWEEN 20150101 AND 20150107;

This is not terrible, but I would consider it to be complicated. Let’s look at how the APPLY operator can simplify things. First of all, I want to migrate all of those ISNULL operations into a separate section and use aliases for ISNULL. It’s a small change but I think it really helps readability. Note that for this particular example, we don’t really benefit from having ISNULL around (because our aggregations already eliminate NULL values), but shifting ISNULL statements is something which I really want to show, given how often it appears in the real world.  Next up, I want to move the Clicks and Cost calculations out into their own section. That way, we can use aliased versions of Clicks and Cost to make our code a bit easier.

Here’s what we end up with as the simplified version:

SELECT
	SUM(us1.Clicks) AS Clicks,
	SUM(nonull.OrderCount) AS OrderCount,
	SUM(nonull.OrderRevenue) AS OrderRevenue,
	SUM(nonull.ProductCost) AS ProductCost,
	SUM(us1.Cost) AS Cost,
	CAST
	(
		CASE
			WHEN SUM(us1.Clicks) = 0 THEN 0.0
			WHEN SUM(nonull.OrderCount) = 0 THEN 0.0
			WHEN SUM(nonull.OrderCount) > SUM(us1.Clicks) THEN 1.0
			ELSE 1.0 * SUM(nonull.OrderCount) / SUM(us1.Clicks)
		END AS DECIMAL(19, 4)
	) AS ConversionRate,
	SUM(nonull.OrderRevenue - (nonull.ProductCost + us1.Cost)) AS NetMargin,
	CASE
		WHEN SUM(us1.Cost) = 0 THEN
			CASE
				WHEN SUM(nonull.OrderRevenue) = 0 THEN 0.0
				ELSE 1.0
			END
		ELSE SUM(nonull.OrderRevenue) / SUM(us1.Cost)
	END AS ReturnOnAdSpend
FROM dbo.#Fact f
	CROSS APPLY
	(
		SELECT
			ISNULL(ClickSource1, 0) AS ClickSource1,
			ISNULL(ClickSource2, 0) AS ClickSource2,
			ISNULL(CostSource1, 0) AS CostSource1,
			ISNULL(CostSource2, 0) AS CostSource2,
			ISNULL(OrderCount, 0) AS OrderCount,
			ISNULL(OrderRevenue, 0) AS OrderRevenue,
			ISNULL(ProductCost, 0) AS ProductCost
	) nonull
	OUTER APPLY
	(
		SELECT
			CASE WHEN f.UseSource1 = 1 THEN nonull.ClickSource1 ELSE nonull.ClickSource2 END AS Clicks,
			CASE WHEN f.UseSource1 = 1 THEN nonull.CostSource1 ELSE nonull.CostSource2 END AS Cost
	) us1
WHERE
	f.ReportDate BETWEEN 20150101 AND 20150107;

This new version has a few more lines of code, but we moved a lot of the “pink and blue” (functions and CASE statement cruft) to two APPLY operators, letting us focus more on the actual attributes in our SELECT statement.  The ConversionRate and ReturnOnAdSpend calculations are still a few lines, but most of that complexity is covering edge cases and is necessary, and with the simplified versions of Clicks and Cost in place, they’re a lot easier to see and understand immediately.

Note that the execution plans for these two queries are almost exactly the same. We haven’t affected our query results negatively, but we did make the code a lot more understandable.

APPLYCalculation

This type of operation is easy, gains a lot, and has extremely low risk of failure, especially if you have good tests around the procedure.

Static SQL Over Dynamic SQL

There are times when dynamic SQL is necessary.  Most commonly, these are scenarios in which you do not know in advance what SQL statement should be generated or you want to perform some action such as dynamic WHERE clauses, column selection, or sorting.  My recommendation here would be to make sure that you actually need the statement to be in dynamic SQL.  Too many times, I’ve seen a simple procedure wrapped in dynamic SQL for no apparent reason.  There are a few problems with this.  First, dynamic SQL is harder to read due to the lack of syntax highlighting.  Second, dynamic SQL does not get syntax checked like static SQL, so it’s easier for odd bugs and typos to sneak in.  Third, formatting dynamic SQL forces you to mess up your code in one of two ways:  either you mess up the static SQL formatting to get the dynamic SQL to look okay, or you mess up the dynamic SQL formatting to get the static SQL to look okay.  Here is a simple example of the former:

DECLARE
	@SomeVar INT = 6; -- input parameter

BEGIN TRY

	DECLARE
		@sql NVARCHAR(MAX) = N'
SELECT
	SomeColumn,
	SomeOtherColumn
FROM SomeTable
WHERE
	1 = 1' + CASE WHEN @SomeVar = 8 THEN N'
	AND Something = 38;';

	EXEC (@sql);
END TRY
BEGIN CATCH
	PRINT 'Error!';
END CATCH

In this case, my dynamic SQL is flush left even though the static SQL is indented one to two tabstops. This makes the code harder to read, especially when I could re-write the procedure using static SQL as:

DECLARE
	@SomeVar INT = 6; -- input parameter

BEGIN TRY
	SELECT
		SomeColumn,
		SomeOtherColumn
	FROM SomeTable
	WHERE
		Something = CASE WHEN @SomeVar = 8 THEN 38 ELSE Something END;
END TRY
BEGIN CATCH
	PRINT 'Error!';
END CATCH

This latter code is easier to read. As I mentioned, there are good cases for using dynamic SQL, but if this isn’t a good case, simplify the procedure and remove the dynamic SQL.

Turn Cursors And WHILE Loops Into Set-Based Operations

I’m going to keep this short because I think my target audience already gets this idea.  Don’t use cursors or WHILE loops unless you need them.  They’re terrible for performance and hard to comprehend quickly.  I’d rather have a nice set-based statement with a tally table than one or two WHILE loops or cursors.  These constructs also require you to think about a lot of temporary variables and how they interact.  Get rid of those loops and you can simplify your thought processes.

Simplify Filters

This one may seem a bit silly, but don’t write it off.  Code bases are full of “whoopsies” like the following.  In this case, it’s ripped straight from the headlines my code base.  This table has a create date and we want to get the number of records for a specific profile over the course of a given hour.  This is a slightly-anonymized version of the original code:

SELECT
	SomeTable.ProfileID,
	CASE
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 0 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 00:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 1 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 01:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 2 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 02:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 3 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 03:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 4 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 04:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 5 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 05:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 6 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 06:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 7 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 07:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 8 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 08:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 9 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 09:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 10 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 10:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 11 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 11:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 12 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 12:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 13 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 13:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 14 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 14:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 15 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 15:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 16 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 16:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 17 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 17:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 18 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 18:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 19 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 19:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 20 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 20:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 21 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 21:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 22 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 22:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 23 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 23:00:00' AS DATETIME)
	END AS CreateDateLTz,
	COUNT(1) AS RecordCount
FROM dbo.SomeTable
GROUP BY
	SomeTable.ProfileID,
	CASE
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 0 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 00:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 1 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 01:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 2 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 02:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 3 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 03:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 4 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 04:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 5 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 05:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 6 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 06:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 7 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 07:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 8 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 08:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 9 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 09:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 10 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 10:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 11 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 11:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 12 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 12:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 13 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 13:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 14 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 14:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 15 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 15:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 16 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 16:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 17 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 17:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 18 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 18:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 19 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 19:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 20 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 20:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 21 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 21:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 22 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 22:00:00' AS DATETIME)
		WHEN DATEPART(hh, SomeTable.CreateDateLTz) = 23 THEN CAST(CONVERT(CHAR(10), SomeTable.CreateDateLTz, 101) + ' 23:00:00' AS DATETIME)
	END;

This is long and replicates code.  It is complicated when it should be simple, so let’s make it simple:

SELECT
	SomeTable.ProfileID,
	DATEADD(HOUR, DATEDIFF(HOUR, 0, SomeTable.CreateDateLTz), 0) AS CreateDateLTzRoundedToHour,
	COUNT(1) AS RecordCount
FROM SearchAdvisor.SomeTable
GROUP BY
	SomeTable.ProfileID,
	DATEADD(HOUR, DATEDIFF(HOUR, 0, SomeTable.CreateDateLTz), 0);

The combination of DATEADD and DATEDIFF let me turn 52 lines of code into 2.  It’s arguably slightly less readable for a complete neophyte, but a quick one-line comment or an appropriate name (like my CreateDateLTzRoundedToHour) makes the result understandable.  It also makes the query more maintainable, as I’m less likely to make a mistake modifying this eight-line query than a sixty-line query with 48 duplicated statements.  It also has a nice aesthetic feel, getting rid of a wall of code.

Look for these types of things in your SELECT, WHERE, HAVING, GROUP BY, and ORDER BY clauses.  Also look for duplicated WHERE clause values and logically equivalent values.

Simplify Overall Procedure Design

This last section will be a quick grab bag of some procedure design tips that I’ve come up with over the years.  I recommend taking a considered approach towards each stored procedure you work on and try to find simpler ways of getting the same outputs given a set of inputs.

First, try to eliminate unnecessary temp tables.  Sometimes, temp tables will be introduced for performance reasons to avoid lazy spooling, spillovers on sorts due to bad estimates, etc.  But if your procedure has a dozen temp tables and you have temp tables loading temp tables which load other temp tables, I might suggest that these probably aren’t the solution to your performance problems, but may instead be the cause of performance problems.  Sometimes, people create temp tables because they’re thinking procedurally—first, the data need to be shaped this way, then this way, then this way, and finally this way.  They create temp tables for each step along the way and end up with a final result.  See if you can write a single query to get from beginning to end, and make sure that it performs well.  You may still need some temp tables in some cases, but make those decisions based on performance rather than mental workflow.

Second, eliminate unnecessary code.  You have a source control system (right?), so don’t leave that commented-out T-SQL from 8 years ago in your code.  Commented-out code does absolutely nothing for SQL Server and nothing for the programmer.

Third, make sure your variables and aliases are clear.  @Median is much clearer in intent than @m.  If your table is named StockOption, “so” makes for a better alias than “t8.”  Similarly, make sure your attribute names are clear and accurate.  One example which has bitten me in the past was an attribute named “Conversions” which actually listed conversion rates.  The number of conversions and the percentage rate of making a conversion are two totally different things and mis-naming can lead to confusion.  Finally, make sure your ORDER BY statement uses actual attribute or alias names; don’t use ordinals.

Fourth, take advantage of the language.  Let’s take an example in which we need to get a list of records fitting a specific condition, update those records, and then pass that list of records off to the next step in the procedure.  Most of the time, people will create a temp table like so:

BEGIN TRANSACTION

CREATE TABLE #Fact
(
	Id INT,
	SomeVal CHAR(1)
);

INSERT INTO dbo.#Fact
(
	Id,
	SomeVal
)
VALUES
(0,''),
(1, 'A'),
(2, 'B'),
(3, 'A'),
(4, '');

CREATE TABLE #tmp
(
	Id INT
);

INSERT INTO dbo.#tmp
(
	Id
)
SELECT
	Id
FROM #Fact
WHERE
	SomeVal = '';

UPDATE f
SET
	SomeVal = 'C'
FROM #Fact f
	INNER JOIN #tmp t
		ON f.Id = t.Id;

SELECT
	Id
FROM #tmp;

ROLLBACK TRANSACTION

Instead of using a temp table to update and then select values, I can use the OUTPUT clause to simplify this process.

BEGIN TRANSACTION

CREATE TABLE #Fact
(
	Id INT,
	SomeVal CHAR(1)
);

INSERT INTO dbo.#Fact
(
	Id,
	SomeVal
)
VALUES
(0,''),
(1, 'A'),
(2, 'B'),
(3, 'A'),
(4, '');

UPDATE f
SET
	SomeVal = 'C'
OUTPUT
	DELETED.Id
FROM #Fact f
WHERE
	f.SomeVal = '';

ROLLBACK TRANSACTION

Instead of creating a temp table, populating that temp table, updating the base table, and selecting from the temp table, I have a single update statement. What’s nice is that in addition to being more concise and elegant, the code is also faster:

SimplifyTempTable

I ran both versions of code against one another, first the old version and then the new.  The old version is responsible for 65% of the estimated cost, and only looking at the parts I changed, the new version is more than twice as efficient in terms of I/O and much better in terms of CPU.  This is a case in which simpler code is also more efficient, making it a win-win for you.

Summary

In this blog post, I covered a number of techniques we can use to simplify code, looking at several use cases based on actual code I’ve deal with.  There are a number of other techniques you can use to help simplify your codebase, but hopefully this ends up giving you some ideas on how to improve your own code base.

Stay tuned for tomorrow’s post, in which I look into how we can simplify entire processes and not just a single procedure.

Advertisements

One thought on “Simplify!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s