A Slice of Time: Offset Functions

This is part four in a series on window functions in SQL Server.

Offset Functions in SQL

Offset functions are another class of window function in SQL Server thanks to being part of the ANSI SQL standard. Think of a window of data, stretching over some number of rows. What we want to do is take the current row and then look back (or “lag” the data) or forward (“lead”).

There are four interesting offset window functions: LAG(), LEAD(), FIRST_VALUE(), and LAST_VALUE(). All four of these offset functions require an ORDER BY clause because of the nature of these functions. LAG() and LEAD() take two parameters, one of which is required. We need to know the value to lag/lead, so that’s a mandatory parameter. In addition, we have an optional parameter which indicates how many steps forward or backward we want to look. Let’s see this in action with a query:

WITH records AS
(
    SELECT
        i.InvoiceDate,
        SUM(il.LineProfit) AS DailyProfit
    FROM Sales.InvoiceLinesSmall il
        INNER JOIN Sales.InvoicesSmall i
            ON il.InvoiceID = i.InvoiceID
    GROUP BY
        i.InvoiceDate
)
SELECT
    r.InvoiceDate,
    r.DailyProfit,
    LAG(r.DailyProfit) OVER (ORDER BY r.InvoiceDate) AS LagDefaultProfit,
    LAG(r.DailyProfit, 1) OVER (ORDER BY r.InvoiceDate) AS PriorDayProfit,
    LAG(r.DailyProfit, 2) OVER (ORDER BY r.InvoiceDate) AS PriorDay2Profit,
    LAG(r.DailyProfit, 7) OVER (ORDER BY r.InvoiceDate) AS PriorDay7Profit,
    LAG(r.DailyProfit, 14) OVER (ORDER BY r.InvoiceDate) AS PriorDay14Profit,
    LEAD(r.DailyProfit) OVER (ORDER BY r.InvoiceDate) AS LeadDefaultProfit,
    LEAD(r.DailyProfit, 1) OVER (ORDER BY r.InvoiceDate) AS NextDayProfit,
    LEAD(r.DailyProfit, 2) OVER (ORDER BY r.InvoiceDate) AS NextDay2Profit,
    LEAD(r.DailyProfit, 7) OVER (ORDER BY r.InvoiceDate) AS NextDay7Profit,
    LEAD(r.DailyProfit, 14) OVER (ORDER BY r.InvoiceDate) AS NextDay14Profit
FROM records r
ORDER BY
    r.InvoiceDate;

We start out with a common table expression, calculating the sum of line profit by invoice date. Then, we’ll try out a variety of LAG() and LEAD() options. The default, when you do not pass in a second parameter, is 1, so LagDefaultProfit will return the same results as PriorDayProfit.

Lots of columns to lag and to lead.

The LagDefaultProfit and PriorDayProfit columns are NULL for the first row because there is no prior record. One important thing to note is that, even if there is data for 2012-12-31 (or any prior date), it will not appear in the result set. For LAG(2), the first two records will be NULL, and then the third row will include the value of daily profit from the first row.

I don’t show it in this image, but LEAD() works just like the inverse of LAG(), where the final row for LEAD(1) will be NULL.

LAG() and LEAD() are very useful analytical functions when working with business reports. You might want to show the current day, prior day, and difference between the two. This is extremely easy to do: SELECT r.DailyProfit - LAG(r.DailyProfit) OVER (ORDER BY r.InvoiceDate). Without the LAG() function, we’d need to create some terrible self-join to accomplish the same, especially if we wanted to include multiple prior and future periods.

First and Last Values

The other two offset functions of note are FIRST_VALUE() and LAST_VALUE(). Not surprisingly, these show the first and last values in a window, respectively. These functions accept a single parameter representing the value whose first/last value you want. We also need to include an ORDER BY statement in the OVER() clause.

WITH records AS
(
    SELECT
        i.InvoiceDate,
        SUM(il.LineProfit) AS DailyProfit
    FROM Sales.InvoiceLinesSmall il
        INNER JOIN Sales.InvoicesSmall i
            ON il.InvoiceID = i.InvoiceID
    GROUP BY
        i.InvoiceDate
)
SELECT
    r.InvoiceDate,
    r.DailyProfit,
    FIRST_VALUE(r.DailyProfit) OVER (ORDER BY r.InvoiceDate) AS DefaultFirstValue,
    LAST_VALUE(r.DailyProfit) OVER (ORDER BY r.InvoiceDate) AS DefaultLastValue
FROM records r
ORDER BY
    r.InvoiceDate;

Once again, we’re aggregating data by date first, and then we’re going to look at the first and last values. Let’s see the results:

The last value doesn’t seem to be very last-valuey.

The DefaultFirstValue column is exactly what we’d expect: the first DailyProfit row in our window, or $14,346.50. The last row in my dataset is $119,456.90, which is definitely not any of the DefaultLastValue records we see. What gives? This is a bug in SQL Server, right?

A Reminder of Frame Defaults

Remember that the default frame for a window is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. If we think about that, the DefaultLastValue makes perfect sense. For the first row, our entire window is 2013-01-01 itself, meaning that its first daily profit is $14,346.50 and its last daily profit is $14,346.50. Then, move to the second row and our first is $14,346.50 and last is $80,550.35. This probably isn’t what we expect with LAST_VALUE() and therefore we need to change the frame. Let’s change that to range from the beginning of our window to the end of the window, not just up to the current row.

WITH records AS
(
    SELECT
        i.InvoiceDate,
        SUM(il.LineProfit) AS DailyProfit
    FROM Sales.InvoiceLinesSmall il
        INNER JOIN Sales.InvoicesSmall i
            ON il.InvoiceID = i.InvoiceID
    GROUP BY
        i.InvoiceDate
)
SELECT
    r.InvoiceDate,
    r.DailyProfit,
    FIRST_VALUE(r.DailyProfit) OVER (ORDER BY r.InvoiceDate) AS DefaultFirstValue,
    LAST_VALUE(r.DailyProfit) OVER (
        ORDER BY r.InvoiceDate
        RANGE BETWEEN UNBOUNDED PRECEDING
            AND UNBOUNDED FOLLOWING) AS RealLastValue
FROM records r
ORDER BY
    r.InvoiceDate;
Now our last value looks right.

And of course, this is not just limited to LAST_VALUE(). We’re ordering by r.InvoiceDate ascending inside the two window functions, so that’s why I didn’t need to do anything special for FIRST_VALUE() here. If you include inside FIRST_VALUE() an ORDER BY descending, you’ll have to remember the frame for it too.

A Quick Reminder of Partitions

Over the last couple of posts, I’ve pretty much ignored partitions, not because they’re uninteresting, but because I wanted to get across the basic functionality of different window functions. Here’s an example of partitioning by customer ID and performing the same checks for first and last value. This is a more realistic use case for the two functions:

WITH records AS
(
    SELECT
        i.InvoiceDate,
        i.CustomerID,
        SUM(il.LineProfit) AS DailyProfit
    FROM Sales.InvoiceLinesSmall il
        INNER JOIN Sales.InvoicesSmall i
            ON il.InvoiceID = i.InvoiceID
    GROUP BY
        i.InvoiceDate,
        i.CustomerID
)
SELECT
    r.InvoiceDate,
    r.CustomerID,
    r.DailyProfit,
    FIRST_VALUE(r.DailyProfit) OVER (
        PARTITION BY r.CustomerID
        ORDER BY r.InvoiceDate) AS DefaultFirstValue,
    LAST_VALUE(r.DailyProfit) OVER (
        PARTITION BY r.CustomerID
        ORDER BY r.InvoiceDate
        RANGE BETWEEN UNBOUNDED PRECEDING
            AND UNBOUNDED FOLLOWING) AS RealLastValue
FROM records r
ORDER BY
    r.CustomerID;
Splitting between customers 888 and 889.

Conclusion

Offset functions are another great addition to SQL Server 2012, as they make common business analytics queries significantly easier.