This is part one in a series on window functions in SQL Server.
What is a Window Function?
A window function is, quite simply, a function applied over a window of data. If that doesn’t sound impressively reductionist, I don’t know what will. So let’s unpack this a little bit further.
A window function operates over a window of data. Here’s an example of a tumbling window.
What we see here is a set of events running over time. We have a window of a fixed size, perhaps one minute in length. We capture all of the events during this stretch and operate on the elements within each window.
What is a Window?
Another way to think about a window is that it is the context of operation for a function call. Suppose we have a dataset and we want to perform some analysis for each invoice ID.
The way that we define the context of our window is using the
OVER() clause in SQL.
What Constitutes a Window Function?
OVER() there are four things which are critical to a window function. They are:
- Use of the
OVER()clause, which defines the window we use.
PARTITION BYclause, which allows us to filter rows in a window function.
ORDER BYclause provides sorting when we need an ordered set, but it also provides meaning to dataframes when aggregating data, as we’ll see in future posts.
- A frame, using
RANGE, to filter within a partition.
Here’s a sample of what a window function looks like:
SELECT Col1, Col2, WINDOW_FUNCTION() OVER( PARTITION BY Col1 ORDER BY Col2 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS wf FROM dbo.Table;
Our window function uses the
OVER() clause to define that we want to
PARTITION BY column 1, meaning that as column 1’s value changes, we create new windows. We sort within each window by column 2. Finally, we have a frame which includes all rows from the beginning of our window up to and including the current row.
Why Should I Care?
Window functions can solve a variety of use cases in an efficient manner. Over the course of this series, I’ll show you how we can use window functions to solve the following problems:
- Deleting duplicate rows in a table
- Calculating running totals
- Calculating moving averages
- Calculating percents of a total
- Getting the latest N items from a table
- Calculating the distinct number of customers who have purchased a product over time
In addition to these, we’ll cover a whole lot more.