This is part nine in a series on window functions in SQL Server.

POC: the Rule of Thumb

If you’ve been around the block with window functions, you’ve probably heard of the POC indexing strategy: Partition by, Order by, Covering. In other words, with a query, focus on the columns in the PARTITION BY clause (in order!), then the ORDER BY clause (again, in order!), and finally other columns in the SELECT clause to make the index covering (not in order! though it doesn’t hurt!).

Let’s suppose you have a really great query like this one:

WITH records AS
(
    SELECT
        ROW_NUMBER() OVER (
            PARTITION BY i.CustomerID
            ORDER BY i.OrderID) AS rownum
    FROM Sales.Invoices i
)
SELECT *
FROM records r
WHERE
    r.rownum = 0;
GO

You may be asking yourself, why would you filter where rownum = 0 if the minimum rownum value has to be 1? Because, that’s why! Well, really it’s so that we don’t take into consideration the time to write out rows. Anyhow, here’s the query plan:

A simple plan.

This query took 58 milliseconds, including 304 milliseconds of CPU but burned 103,384 I/O along the way.

Let’s add an index starting with customer ID and including order ID, as those are our partition and order columns, respectively.

IF NOT EXISTS
(
	SELECT 1
	FROM sys.indexes i
	WHERE
		i.name = N'IX_Sales_Invoices_WindowFunction'
)
BEGIN
	CREATE INDEX [IX_Sales_Invoices_WindowFunction] ON Sales.Invoices
	(
		CustomerID,
		OrderID
	);
END
GO

From there, we can try the query again.

Lookin’ good.

We still have an index scan but the plan is a bit simpler than before. We don’t need to perform a sort. This doesn’t look like much from the plan itself, but the benefits are pretty big: we’re down to 35 ms of duration single-threaded (so 35 ms of CPU) and just 1497 reads.

But do watch out: order matters. With this query, we’d expect exactly the same outcome. After all, indexes are doubly-linked, so I can read it from either direction.

WITH records AS
(
    SELECT
        ROW_NUMBER() OVER (
            PARTITION BY i.CustomerID
            ORDER BY i.OrderID DESC) AS rownum
    FROM Sales.Invoices i
)
SELECT *
FROM records r
WHERE
    r.rownum = 0;
GO

One sad sort later.

It turns out that, when you create an index to support a window function, you really need to get the order in order.

WPOC: Why Thumbs Should Not Rule

There’s a big exception to the POC rule of thumb: if you have a great filter, put your filters first. In the examples above, we scanned through the entire set of invoices. Suppose, however, we only need specific invoices.

WITH records AS
(
    SELECT
        ROW_NUMBER() OVER (
            PARTITION BY i.CustomerID
            ORDER BY i.OrderID) AS rownum
    FROM Sales.Invoices i
	WHERE
		i.TotalDryItems > 3
)
SELECT *
FROM records r
WHERE
    r.rownum = 0;
GO
Well, this is a step back.

We’re back to 103,384 reads and a clustered index scan. Ideally, we filter on the WHERE clause first and cut down the number of rows we need. If this is the type of query we run a lot, we should modify our index. Here’s our first try:

CREATE INDEX [IX_Sales_Invoices_WindowFunction] ON Sales.Invoices
(
	TotalDryItems,
	CustomerID,
	OrderID
) WITH(DROP_EXISTING = ON);
This is better.

The query plan doesn’t necessarily look great, but it is an improvement: 8ms duration, 35ms of CPU, and 472 reads. If the intended query is just to include at least 3 dry items, we might even have a use case for a filtered index!

CREATE INDEX [IX_Sales_Invoices_WindowFunction] ON Sales.Invoices
(
	CustomerID,
	OrderID
)
WHERE TotalDryItems > 3
WITH(DROP_EXISTING = ON);
Now we’re talking.

41ms of duration on a non-parallel query and a mere 369 writes.

What about Multiple Windows?

Okay, Mr. Smart Guy, what happens when you have multiple windows? Well…you’re going to live with it. You’ll get one index to retrieve the data (making that leading WHERE clause even more important for indexing) and then you’ll get lots of sorts and window operations. If you want all of the windows in a single query, there’s not too much more you can do. If performance is really bad, dump the results into a temp table and try simpler queries.

Conclusion

If you do need to scan the entire table, then creating indexes based on the POC pattern. But as soon as you need to narrow down your queries, WPOC it is.

2 thoughts on “A Slice of Time: Indexing and Window Functions

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s