*This is part seven in a series on window functions in SQL Server.*

## The Road So Far

To this point, we’ve looked at five classes of window function in SQL Server. I’ve given you a couple of solid use cases, but for the most part, we’ve focused on **what** the classes of window functions are. Now we’re going to talk about **why** you want to use them.

## What’s the Point?

Window functions help us solve a variety of problem types in a rather efficient way. Let’s cover several in turn.

### Deleting Duplicate Rows

Suppose you have a really great event system. You track events and numbers of attendees, and your app works really well: it even inserts data when you ask it to!

```
CREATE TABLE #t1
(
Id INT IDENTITY(1,1) NOT NULL,
EventDate DATE NOT NULL,
NumberOfAttendees INT NOT NULL
);
-- Our first round of insertions
INSERT INTO #t1(EventDate, NumberOfAttendees) VALUES
('2021-04-01', 150),
('2021-04-08', 165),
('2021-04-15', 144),
('2021-04-22', 170),
('2021-04-29', 168),
('2021-05-06', 164),
('2021-05-13', 152),
('2021-05-20', 158),
('2021-05-27', 168),
('2021-06-03', 170),
('2021-06-10', 161),
('2021-06-17', 155);
```

In fact, it’s so great that sometimes it inserts the same data multiple times, which I’m pretty sure is how you back up the data in a database.

```
INSERT INTO #t1(EventDate, NumberOfAttendees) VALUES
('2021-04-01', 150),
('2021-04-01', 150),
('2021-04-01', 150),
('2021-04-01', 150),
('2021-04-08', 165),
('2021-04-08', 165),
('2021-04-29', 168);
```

Some people might appreciate having 5 copies of their data for maximum redundancy, but it turns out that this makes it difficult for event organizers to understand what’s going on, and they keep getting weird results in the app because we can only have one event per day, just like in real life.

So how do we delete the bad rows? Well, we could give an intern a print-out copy of a spreadsheet containing all of the data in the table and then hand-write `DELETE`

statements based on the ID column. Actually, forget this whole window function thing—let’s just hire a whole bunch of interns. Problem solved!

…My producer is telling me that this is not a good enough solution to the problem and if I want to finish up this blog post, I have to solve it using window functions. So let’s do that so I can ~~go home~~ continue being at home.

The following query gets me a unique, ascending row number for each event date, sorted by the ID of the event. We’re going to call that `rownum`

because I am not clever with naming.

```
SELECT
t1.Id,
t1.EventDate,
t1.NumberOfAttendees,
ROW_NUMBER() OVER (PARTITION BY t1.EventDate ORDER BY t1.Id) AS rownum
FROM #t1 t1
```

Once we have our row number in place, we can perform the kill shot:

```
WITH records AS
(
SELECT
t1.Id,
t1.EventDate,
t1.NumberOfAttendees,
ROW_NUMBER() OVER (PARTITION BY t1.EventDate ORDER BY t1.Id) AS rownum
FROM #t1 t1
)
DELETE FROM records
WHERE
rownum > 1;
```

Doing this gets us back to having one record per event.

### Running Totals

Yes, we talked about this in a prior blog post. We’re talking about it again here because it’s just that important. Let’s say that we want to know how many attendees have come to our events over time. We can use a running total to get those results:

```
SELECT
t1.Id,
t1.EventDate,
t1.NumberOfAttendees,
SUM(t1.NumberOfAttendees) OVER (
ORDER BY t1.EventDate
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS RunningTotal
FROM #t1 t1;
```

### Moving Average

Let’s say that we’re trying to talk advertisers into giving us all kinds of money to sponsor our events. We could show them attendee counts, but what if there’s a down week or a boom week? We don’t want them to think every week is like that. One good way to deal with ups and downs is to smooth things using moving averages. Here, we’re going to calculate the average number of attendees over a three-event period.

```
SELECT
t1.Id,
t1.EventDate,
t1.NumberOfAttendees,
AVG(t1.NumberOfAttendees) OVER (
ORDER BY t1.EventDate
ROWS BETWEEN 2 PRECEDING
AND CURRENT ROW
) AS MovingAverage3
FROM #t1 t1;
```

The way this works is, if we have fewer than three weeks in a window, we take the average of however many weeks we do have. Thus, the first week has 150 attendees and a moving average of 150. The second week has 165 attendees and a moving average of (150+165)/2 = 157 (clip off the decimal because we’re doing integer math here). Once we get to the third week, we have (150+165+144)/3 = 153. For week four, we drop off the first entry because we only want the current row and the two preceding rows: (165+144+170)/3 = 159.

One quick point here is that for moving averages, we’re generally going to want to use `ROWS`

as the frame rather than `RANGE`

. Why? For sad reasons that I’ll talk about in a later post.

### Percentage of a Total and Cumulative Percent

Okay, we’ve seen some of the basics, but let’s get a little crazy.

```
SELECT
t1.Id,
t1.EventDate,
t1.NumberOfAttendees,
SUM(t1.NumberOfAttendees) OVER (
ORDER BY t1.EventDate
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS RunningTotal,
SUM(t1.NumberOfAttendees) OVER () AS GrandTotal,
1.0 * t1.NumberOfAttendees / SUM(t1.NumberOfAttendees) OVER () AS PercentOfTotal,
1.0 * SUM(t1.NumberOfAttendees) OVER (
ORDER BY t1.EventDate
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) / SUM(t1.NumberOfAttendees) OVER () AS CumulativePercent
FROM #t1 t1;
```

Let’s talk about what we get in one query:

- A running total. We’ve seen these before so they’re old hat.
- A grand total. Saw this one as well in our aggregate functions post.
- The number of attendees in a given week divided by the grand total is the percent of the total. This is new and gives us the share of the whole which a single row represents.
- The running total divided by the grand total is the cumulative percentage of attendees. This is also the same as the running total of the
`PercentOfTotal`

column.

Not bad for a query’s work. Now, if I were putting this together in a real solution, I’d probably wrap the running total and grand total calculations in a CTE so I can reuse them more easily:

```
WITH records AS
(
SELECT
t1.Id,
t1.EventDate,
t1.NumberOfAttendees,
SUM(t1.NumberOfAttendees) OVER (
ORDER BY t1.EventDate
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS RunningTotal,
SUM(t1.NumberOfAttendees) OVER () AS GrandTotal
FROM #t1 t1
)
SELECT
t1.Id,
t1.EventDate,
t1.NumberOfAttendees,
t1.RunningTotal,
t1.GrandTotal,
1.0 * t1.NumberOfAttendees / t1.GrandTotal AS PercentOfTotal,
1.0 * t1.RunningTotal / t1.GrandTotal AS CumulativePercent
FROM records t1;
```

### Get the Latest N Values

Suppose I want to find the latest 5 orders for each customer in the Wide World Importers database. Specifically, I want to limit this to customers who do not have ” Toys” in the name. Here’s how I can do this using a window function:

```
WITH customers AS
(
SELECT
c.CustomerID,
c.CustomerName
FROM Sales.Customers c
WHERE
CustomerName NOT LIKE N'% Toys%'
),
invoices AS
(
SELECT
i.CustomerID,
i.InvoiceID,
i.InvoiceDate,
i.TotalChillerItems,
i.TotalDryItems,
ROW_NUMBER() OVER (PARTITION BY i.CustomerID ORDER BY i.InvoiceDate DESC) AS rownum
FROM Sales.InvoicesSmall i
INNER JOIN customers c
ON i.CustomerID = c.CustomerID
)
SELECT *
FROM customers c
LEFT OUTER JOIN invoices i
ON c.CustomerID = i.CustomerID
AND rownum <= 5;
```

Before I go any further, though, let me point out that this is **not** necessarily the most efficient method for getting the N latest records. It can be the most efficient method in some cases, but the APPLY operator can be more efficient in cases where you get a small percentage of a large table. I go into it in a lot more detail in my Teachable course on the topic, and if you use the discount code WINDOW, you can get the course for $20 USD through the end of the year.

Stealth advertisement over.

### Distinct Customer Counts over Time

Have you ever needed to calculate the distinct number of customers on your site over time? No? Um…have you wanted to? I’ll take that awkward silence as a yes!

```
WITH records AS
(
SELECT
OrderDate,
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate) = 1 THEN CustomerID
END AS DistinctCustomerID
FROM Sales.Orders o
)
SELECT DISTINCT
OrderDate,
COUNT(DistinctCustomerID) OVER (ORDER BY OrderDate) AS NumCustomers
FROM records r
ORDER BY
OrderDate;
```

We’re solving this problem with two separate window functions. The first window function determines the first order for a given customer. When that order is the first—that is, when the row number is 1—then we return the customer ID. Otherwise, there is an implicit NULL. This means that we’ll get back one row for each order in `Sales.Orders`

, but we’ll only have a single value per customer which is not NULL.

Then, we take that stream of mostly-NULLs and get the count of distinct customer IDs. Remember that `COUNT()`

ignores NULL values if you specify a column in the aggregate function. Then, we make `COUNT()`

into a window function ordered by `OrderDate`

with an implicit frame of `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`

, giving us the count of non-NULL values in the dataset up to the current date. Slap a `DISTINCT`

on here to collapse down the values and we have our answer.

### Build an Events System and Track Maximum Concurrent Operations

This one’s a doozy and I’ve written about it previously. By the way, that problem and solution looks a bit complex, but I’ve actually used it in real life to solve actual work-related problems. Specifically, it was around seeing if any customer had multiple concurrent upload operations going, as there was a bug in one product team’s code which could fire in that scenario, and so they wanted to see how many customers might have been affected. Fortunately, the answer was zero.

If you want a version for Wide World Importers, let’s suppose that we want to see the maximum number of concurrent orders that a customer has had in transit. Suppose that the `OrderDate`

represents when the customer places an order and `ExpectedDeliveryDate`

is the date the customer receives the order. We can unpivot orders data and figure out the maximum number of concurrent orders per customer using the following query:

```
WITH StartStopPoints AS
(
SELECT
o.CustomerID,
o.OrderDate AS TimeUTC,
1 AS IsStartingPoint,
ROW_NUMBER() OVER (PARTITION BY o.CustomerID ORDER BY o.OrderDate) AS StartOrdinal
FROM Sales.Orders o
UNION ALL
SELECT
o.CustomerID,
o.ExpectedDeliveryDate AS TimeUTC,
0 AS IsStartingPoint,
NULL AS StartOrdinal
FROM Sales.Orders o
),
StartStopOrder AS
(
SELECT
s.CustomerID,
s.TimeUTC,
s.IsStartingPoint,
s.StartOrdinal,
ROW_NUMBER() OVER (PARTITION BY s.CustomerID ORDER BY s.TimeUTC, s.IsStartingPoint) AS StartOrEndOrdinal
FROM StartStopPoints s
)
SELECT
s.CustomerID,
MAX(2 * s.StartOrdinal - s.StartOrEndOrdinal) AS MaxConcurrentOrders
FROM StartStopOrder s
WHERE
s.IsStartingPoint = 1
GROUP BY
s.CustomerID
ORDER BY
MaxConcurrentOrders DESC,
CustomerID ASC;
```

## Conclusion

Over the course of this post, we’ve looked at seven business use cases for window functions and we could easily come up with more. Window functions provide a huge amount of versatility when it comes to solving business problems. Furthermore, as we saw in the last two examples, clever combinations of window functions and reshaping of data can solve challenging business problems quite effectively.

## 2 thoughts on “A Slice of Time: Good Use Cases”