This is part ten in a series on window functions in SQL Server.
The Need for Speed
SQL Server typically operates in row mode, which means that an operator processes one row at a time. This sounds inefficient, but tends to work out pretty well in practice. However, something which may work out even better is to process more than one row at a time, especially when the number of rows gets to be fairly large. Enter batch mode.
Batch mode was introduced in SQL Server 2012 alongside non-clustered columnstore indexes. It became interesting in SQL Server 2016 and very interesting in SQL Server 2019. That’s because 2016 introduced writable clustered columnstore indexes and 2019 gives us batch mode outside of columnstore indexes.
Batch mode lets us work on up to 900 rows of data or 64KB of data at a time per operator. Each operator also evaluates metadata only once per batch versus once per row. Finally, a function operating on a batch can handle all up-to-900 rows at once.
You Must Be This Tall to Ride the Ride
Before we can use batch mode, we need to pass certain requirements. If you are using a columnstore index, the requirements are:
- Batch mode cost must be less than row mode cost
Not too onerous, eh? If you are using SQL Server 2019 and have a rowstore index, the requirements are:
- At least 2^17 rows (131,072) in at least one table in the query
- At least one operator which benefits from batch mode
- At least one input of a batch has at least 2^17 rows
- Batch mode cost is less than row mode cost
In other words, if I have a 1 billion row table but only get 1800 rows from it, I won’t be able to use batch mode on the rowstore index because I need at lest 131,072 rows to pass through an operator. This limits batch mode on rowstore to larger queries, and that’s on purpose: the idea is that we know how row mode on rowstore indexes operates and it does a really good job on smaller data sets. Batch mode might potentially be slower for whatever reason on tiny datasets, and so the optimizer has fairly stringent limts.
The Cheater’s Guide to Batch Mode
If you are using SQL Server 2016 or 2017 and don’t want to work with a columnstore index, you can still take advantage of them to allow for batch mode queries using your rowstore indexes. In particular, you create a filtered columnstore index on an impossible condition. See, the optimizer doesn’t actually check if the columnstore index is the one you use; it checks if the columnstore index exists and assumes that’s good enough for rowstore, because what kind of dummy would put a columnstore index on a table which was horribly unsuited for it?
Here’s an example of such a query:
CREATE NONCLUSTERED COLUMNSTORE INDEX [NCIS_MyTable] ON dbo.MyTable ( Col1, Col2, ... ) WHERE ID = -1;
Create that index and you’re in Batch Mode City, where the cars zoom past and everybody needs to wear sunglasses even at night.
Batches of Data
To show off batch mode, let’s use a table which might actually benefit from it. My
Warehouse.ColdRoomTemperature table in WideWorldImporters is about 40 million rows large. Let’s suppose that I want to run a window function which partitions by cold room sensor number. Here’s a query which does just that. Note that I do have an optimizer hint to force this into SQL Server 2017 mode, which guarantees no batch mode because the table does not have a columnstore index on it.
WITH records AS ( SELECT ROW_NUMBER() OVER ( PARTITION BY c.ColdRoomSensorNumber ORDER BY c.ColdRoomTemperatureID) AS rownum FROM Warehouse.ColdRoomTemperatures_Archive c ) SELECT * FROM records r WHERE r.rownum = 0 OPTION(USE HINT ('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140')); GO
This query returns zero rows because it’s not possible to have a rownum value of 0, but it sure takes its sweet time doing so.
Let’s remove that option and run the query again.
This particular query took about 1/6th the time to run as the prior, with about 1/3 the total CPU and half the I/O. In practice, on a good query, you can cut the time down to about 1/3 of what it would run in row mode. It’s not an order of magnitude better, but it is a nice chunk of time savings.
Now let’s see how things look against the invoice lines table. Note that this is not the stock WideWorldImporters invoice lines table; I’ve expanded it out to 1.75 million rows so we can see some good examples of batch mode in process.
First up, here’s your daily reminder that just because you can have oodles of partitions doesn’t mean you should.
SELECT il.InvoiceLineID, il.InvoiceID, ROW_NUMBER() OVER (PARTITION BY il.InvoiceID ORDER BY il.InvoiceLineID) AS rownum1, ROW_NUMBER() OVER (PARTITION BY il.StockItemID ORDER BY il.InvoiceLineID) AS rownum2, ROW_NUMBER() OVER (PARTITION BY il.PackageTypeID ORDER BY il.InvoiceLineID) AS rownum3, ROW_NUMBER() OVER (PARTITION BY il.TaxAmount ORDER BY il.InvoiceLineID) AS rownum4, ROW_NUMBER() OVER (PARTITION BY il.LineProfit ORDER BY il.InvoiceLineID) AS rownum5 FROM Sales.InvoiceLines il;
If you have one partition, you’re more likely to get one window aggregate.
WITH records AS ( SELECT il.InvoiceLineID, il.InvoiceID, ROW_NUMBER() OVER (PARTITION BY il.InvoiceID ORDER BY il.InvoiceLineID) AS rownum, RANK() OVER (PARTITION BY il.InvoiceID ORDER BY il.InvoiceLineID) AS rk, DENSE_RANK() OVER (PARTITION BY il.InvoiceID ORDER BY il.InvoiceLineID) AS dr FROM Sales.InvoiceLines il ) SELECT * FROM records WHERE rownum = 0;
Window functions can use batch mode for quick gains with zero developer effort. But do understand that not all queries will be eligible for batch mode.