This is part eight in a series on window functions in SQL Server.
Not All Roses
So far, we’ve seen some really cool things we can do with window functions. That said, there are some limitations with these windows. Some of them are built into how window functions work and others are incomplete implementations of the ANSI SQL standard. Let’s go check it out.
Referencing a Window Function
When you create a window function, you can only reference it in the
ORDER BY clauses.
SELECT t.Col1, t.Col2, WINDOW_FUNCTION() OVER( PARTITION BY Col1 ORDER BY Col2 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS wf FROM dbo.Table t WHERE wf > 5 -- NO! GROUP BY wf -- NO! HAVING wf > 10 -- NO! ORDER BY wf; -- YES!
In this example above, we can’t reference
wf in the
GROUP BY, or
HAVING clauses, nor can we include it as a criterion in any join. The reason for this has to do with the processing order for SQL queries, which is:
- GROUP BY
- ORDER BY
Note that there are a couple more elements (e.g., TOP, OFFSET/FETCH), but they aren’t going to be relevant to our discussion. The window is defined in the
SELECT clause, so we can only reference things lower in the list. Another way to think about it is that the
JOIN clauses help define the window, and so the window cannot subsequently shape them.
To get around this, wrap the window function in a common table expression and you can perform subsequent filters and groupings.
WITH records AS ( SELECT t.Col1, t.Col2, WINDOW_FUNCTION() OVER( PARTITION BY Col1 ORDER BY Col2 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS wf FROM dbo.Table t ) SELECT r.Col1, r.Col2, r.wf FROM records r WHERE r.wf > 5 GROUP BY Col1, Col2, wf HAVING wf > 10 ORDER BY wf;
Like a Nissan Leaf, SQL Server has some pretty harsh limits on range. We can perform the following
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING RANGE BETWEEN CURRENT ROW AND CURRENT ROW
In other words, we get two options for the beginning of our frame and two options for the end of our frame. The problem is that the ANSI SQL standard is a lot richer than what we have in SQL Server today with the
RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND INTERVAL '7' DAY FOLLOWING
In an earlier post, I mentioned the idea of periodic ranges for things like moving averages. Ideally, we’d be able to build
RANGE frames like this and get true moving averages regardless of whether our three-month average includes three rows or thirty.
By contrast, the
ROWS frame has a bit more flexibility. In addition to unbounded preceding/following and current row, you can also specify a certain number of rows preceding or following.
Windows and APPLY
Let’s go back to our use case around deleting duplicate rows.
DROP TABLE IF EXISTS #t1; CREATE TABLE #t1 ( Id INT IDENTITY(1,1) NOT NULL, EventDate DATE NOT NULL, NumberOfAttendees INT NOT NULL ); -- Our first round of insertions INSERT INTO #t1(EventDate, NumberOfAttendees) VALUES ('2021-04-01', 150), ('2021-04-08', 165), ('2021-04-15', 144), ('2021-04-22', 170), ('2021-04-29', 168), ('2021-05-06', 164), ('2021-05-13', 152), ('2021-05-20', 158), ('2021-05-27', 168), ('2021-06-03', 170), ('2021-06-10', 161), ('2021-06-17', 155); -- Due to a bug in our system, we accidentally added some rows again. INSERT INTO #t1(EventDate, NumberOfAttendees) VALUES ('2021-04-01', 150), ('2021-04-01', 150), ('2021-04-01', 150), ('2021-04-01', 150), ('2021-04-08', 165), ('2021-04-08', 165), ('2021-04-29', 168);
Then, suppose that you took my advice and learned all about the
APPLY operator (perhaps even going through my Teachable course on the topic using the discount code WINDOW to save some cash). You probably remember that we used a common table expression to get the duplicate rows due to the fact that we want to filter on results of our window function.
WITH records AS ( SELECT t1.Id, t1.EventDate, t1.NumberOfAttendees, ROW_NUMBER() OVER (PARTITION BY t1.EventDate ORDER BY t1.Id) AS rownum FROM #t1 t1 ) SELECT * FROM records WHERE rownum > 1 ORDER BY EventDate, Id;
You then lament this fact and want to use the
APPLY operator to do it without a common table expression. You write a beautiful T-SQL statement:
SELECT rn.* FROM #t1 t1 CROSS APPLY ( SELECT t1.Id, t1.EventDate, t1.NumberOfAttendees, ROW_NUMBER() OVER (PARTITION BY t1.EventDate ORDER BY t1.Id) AS rownum ) rn WHERE rn.rownum > 1;
That right there is a beaut. Unfortunately, if you were to run this query, you’d get back zero rows. Every row will have a rownum value of 1. The reason is that
CROSS APPLY executes the
rn operation independently for every row in
t1, and so the window function runs once per row, not once across the entire table. Therefore, when you have a single row in your window,
ROW_NUMBER() won’t do much for you.
The last thing I want to cover is a quick reminder that the
FIRST_VALUE() functions have what may seem like strange behavior when you use them on ascending and descending ordering statements, respectively. This behavior does ultimately make sense once you consider how frames work, but it can catch people unawares.
Window functions are great, but it’s important to be aware of how the functionality works so that you don’t come in with a bad assumption and get the wrong results. It’s also important to know that SQL Server doesn’t implement the entire spec for window functions, and range intervals is one of the most useful and understandable missing items.