A Slice of Time: Window Function Limitations

This is part eight in a series on window functions in SQL Server.

Not All Roses

So far, we’ve seen some really cool things we can do with window functions. That said, there are some limitations with these windows. Some of them are built into how window functions work and others are incomplete implementations of the ANSI SQL standard. Let’s go check it out.

Referencing a Window Function

When you create a window function, you can only reference it in the SELECT and ORDER BY clauses.

SELECT
	t.Col1,
	t.Col2,
	WINDOW_FUNCTION() OVER(
			PARTITION BY Col1
			ORDER BY Col2
			ROWS BETWEEN UNBOUNDED PRECEDING
				AND CURRENT ROW
		) AS wf
FROM dbo.Table t
WHERE wf > 5 -- NO!
GROUP BY wf -- NO!
HAVING wf > 10 -- NO!
ORDER BY wf; -- YES!

In this example above, we can’t reference wf in the WHERE, GROUP BY, or HAVING clauses, nor can we include it as a criterion in any join. The reason for this has to do with the processing order for SQL queries, which is:

FROM
JOIN
WHERE
GROUP BY
HAVING
SELECT
DISTINCT
ORDER BY

Note that there are a couple more elements (e.g., TOP, OFFSET/FETCH), but they aren’t going to be relevant to our discussion. The window is defined in the SELECT clause, so we can only reference things lower in the list. Another way to think about it is that the WHERE, GROUP BY, HAVING, and JOIN clauses help define the window, and so the window cannot subsequently shape them.

To get around this, wrap the window function in a common table expression and you can perform subsequent filters and groupings.

WITH records AS
(
	SELECT
		t.Col1,
		t.Col2,
		WINDOW_FUNCTION() OVER(
				PARTITION BY Col1
				ORDER BY Col2
				ROWS BETWEEN UNBOUNDED PRECEDING
					AND CURRENT ROW
			) AS wf
	FROM dbo.Table t
)
SELECT r.Col1, r.Col2, r.wf
FROM records r
WHERE r.wf > 5
GROUP BY Col1, Col2, wf
HAVING wf > 10
ORDER BY wf;

Limited RANGE

Like a Nissan Leaf, SQL Server has some pretty harsh limits on range. We can perform the following RANGE operations:

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
RANGE BETWEEN CURRENT ROW AND CURRENT ROW

In other words, we get two options for the beginning of our frame and two options for the end of our frame. The problem is that the ANSI SQL standard is a lot richer than what we have in SQL Server today with the INTERVAL logic.

RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW
RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND INTERVAL '7' DAY FOLLOWING

In an earlier post, I mentioned the idea of periodic ranges for things like moving averages. Ideally, we’d be able to build RANGE frames like this and get true moving averages regardless of whether our three-month average includes three rows or thirty.

By contrast, the ROWS frame has a bit more flexibility. In addition to unbounded preceding/following and current row, you can also specify a certain number of rows preceding or following.

Windows and APPLY

Let’s go back to our use case around deleting duplicate rows.

DROP TABLE IF EXISTS #t1;

CREATE TABLE #t1
(
    Id INT IDENTITY(1,1) NOT NULL,
    EventDate DATE NOT NULL,
    NumberOfAttendees INT NOT NULL
);

-- Our first round of insertions
INSERT INTO #t1(EventDate, NumberOfAttendees) VALUES
('2021-04-01', 150),
('2021-04-08', 165),
('2021-04-15', 144),
('2021-04-22', 170),
('2021-04-29', 168),
('2021-05-06', 164),
('2021-05-13', 152),
('2021-05-20', 158),
('2021-05-27', 168),
('2021-06-03', 170),
('2021-06-10', 161),
('2021-06-17', 155);

-- Due to a bug in our system, we accidentally added some rows again.
INSERT INTO #t1(EventDate, NumberOfAttendees) VALUES
('2021-04-01', 150),
('2021-04-01', 150),
('2021-04-01', 150),
('2021-04-01', 150),
('2021-04-08', 165),
('2021-04-08', 165),
('2021-04-29', 168);

Then, suppose that you took my advice and learned all about the APPLY operator (perhaps even going through my Teachable course on the topic using the discount code WINDOW to save some cash). You probably remember that we used a common table expression to get the duplicate rows due to the fact that we want to filter on results of our window function.

WITH records AS
(
    SELECT
        t1.Id,
        t1.EventDate,
        t1.NumberOfAttendees,
        ROW_NUMBER() OVER (PARTITION BY t1.EventDate ORDER BY t1.Id) AS rownum
    FROM #t1 t1
)
SELECT *
FROM records
WHERE rownum > 1
ORDER BY
    EventDate,
    Id;

You then lament this fact and want to use the APPLY operator to do it without a common table expression. You write a beautiful T-SQL statement:

SELECT
	rn.*
FROM #t1 t1
	CROSS APPLY
	(
		SELECT
			t1.Id,
			t1.EventDate,
			t1.NumberOfAttendees,
			ROW_NUMBER() OVER (PARTITION BY t1.EventDate ORDER BY t1.Id) AS rownum
	) rn
WHERE rn.rownum > 1;

That right there is a beaut. Unfortunately, if you were to run this query, you’d get back zero rows. Every row will have a rownum value of 1. The reason is that CROSS APPLY executes the rn operation independently for every row in t1, and so the window function runs once per row, not once across the entire table. Therefore, when you have a single row in your window, ROW_NUMBER() won’t do much for you.

LAST_VALUE

The last thing I want to cover is a quick reminder that the LAST_VALUE() and FIRST_VALUE() functions have what may seem like strange behavior when you use them on ascending and descending ordering statements, respectively. This behavior does ultimately make sense once you consider how frames work, but it can catch people unawares.

Conclusion

Window functions are great, but it’s important to be aware of how the functionality works so that you don’t come in with a bad assumption and get the wrong results. It’s also important to know that SQL Server doesn’t implement the entire spec for window functions, and range intervals is one of the most useful and understandable missing items.

36 Chambers – The Legendary Journeys: Execution to the max!