This is part nine in a series on window functions in SQL Server.
POC: the Rule of Thumb
If you’ve been around the block with window functions, you’ve probably heard of the POC indexing strategy: Partition by, Order by, Covering. In other words, with a query, focus on the columns in the
PARTITION BY clause (in order!), then the
ORDER BY clause (again, in order!), and finally other columns in the
SELECT clause to make the index covering (not in order! though it doesn’t hurt!).
Let’s suppose you have a really great query like this one:
WITH records AS ( SELECT ROW_NUMBER() OVER ( PARTITION BY i.CustomerID ORDER BY i.OrderID) AS rownum FROM Sales.Invoices i ) SELECT * FROM records r WHERE r.rownum = 0; GO
You may be asking yourself, why would you filter where rownum = 0 if the minimum rownum value has to be 1? Because, that’s why! Well, really it’s so that we don’t take into consideration the time to write out rows. Anyhow, here’s the query plan:
This query took 58 milliseconds, including 304 milliseconds of CPU but burned 103,384 I/O along the way.
Let’s add an index starting with customer ID and including order ID, as those are our partition and order columns, respectively.
IF NOT EXISTS ( SELECT 1 FROM sys.indexes i WHERE i.name = N'IX_Sales_Invoices_WindowFunction' ) BEGIN CREATE INDEX [IX_Sales_Invoices_WindowFunction] ON Sales.Invoices ( CustomerID, OrderID ); END GO
From there, we can try the query again.
We still have an index scan but the plan is a bit simpler than before. We don’t need to perform a sort. This doesn’t look like much from the plan itself, but the benefits are pretty big: we’re down to 35 ms of duration single-threaded (so 35 ms of CPU) and just 1497 reads.
But do watch out: order matters. With this query, we’d expect exactly the same outcome. After all, indexes are doubly-linked, so I can read it from either direction.
WITH records AS ( SELECT ROW_NUMBER() OVER ( PARTITION BY i.CustomerID ORDER BY i.OrderID DESC) AS rownum FROM Sales.Invoices i ) SELECT * FROM records r WHERE r.rownum = 0; GO
It turns out that, when you create an index to support a window function, you really need to get the order in order.
WPOC: Why Thumbs Should Not Rule
There’s a big exception to the POC rule of thumb: if you have a great filter, put your filters first. In the examples above, we scanned through the entire set of invoices. Suppose, however, we only need specific invoices.
WITH records AS ( SELECT ROW_NUMBER() OVER ( PARTITION BY i.CustomerID ORDER BY i.OrderID) AS rownum FROM Sales.Invoices i WHERE i.TotalDryItems > 3 ) SELECT * FROM records r WHERE r.rownum = 0; GO
We’re back to 103,384 reads and a clustered index scan. Ideally, we filter on the
WHERE clause first and cut down the number of rows we need. If this is the type of query we run a lot, we should modify our index. Here’s our first try:
CREATE INDEX [IX_Sales_Invoices_WindowFunction] ON Sales.Invoices ( TotalDryItems, CustomerID, OrderID ) WITH(DROP_EXISTING = ON);
The query plan doesn’t necessarily look great, but it is an improvement: 8ms duration, 35ms of CPU, and 472 reads. If the intended query is just to include at least 3 dry items, we might even have a use case for a filtered index!
CREATE INDEX [IX_Sales_Invoices_WindowFunction] ON Sales.Invoices ( CustomerID, OrderID ) WHERE TotalDryItems > 3 WITH(DROP_EXISTING = ON);
41ms of duration on a non-parallel query and a mere 369 writes.
What about Multiple Windows?
Okay, Mr. Smart Guy, what happens when you have multiple windows? Well…you’re going to live with it. You’ll get one index to retrieve the data (making that leading
WHERE clause even more important for indexing) and then you’ll get lots of sorts and window operations. If you want all of the windows in a single query, there’s not too much more you can do. If performance is really bad, dump the results into a temp table and try simpler queries.
If you do need to scan the entire table, then creating indexes based on the POC pattern. But as soon as you need to narrow down your queries, WPOC it is.