Fun With SQL Equality

T-SQL has some interesting concepts regarding string equality:

SELECT
	CASE
		WHEN N'String with spaces          ' = N'String with spaces' THEN 1
		ELSE 0
	END;

Trailing spaces are not enough to consider a string to be unique.

SELECT
	CASE
		WHEN N'179³' = N'1793' THEN 1
		ELSE 0
	END;

Subscripts and superscripts are not enough to consider a string to be unique.

If you need to put a unique index on an NVARCHAR column, the best bet might be to add a SHA1 hash of the string. If you are okay with case-sensitive strings, you can use HASHBYTES(‘SHA1′, @YourString) to do the trick.

Get Row Counts For Partitioned Tables

I’m adding this query to my SQL utility belt.  This query tells you, for a specific table, all of the partitions, the lowest value in that partition’s range, and the (estimated) number of records in that partition.

SELECT
	OBJECT_SCHEMA_NAME(o.object_id) AS SchemaName,
	o.name AS TableName,
	ddps.partition_number AS PartitionNumber,
	prv.value AS EndValue,
	ddps.row_count AS [RowCount]
FROM sys.indexes i
	INNER JOIN sys.objects o
		ON i.object_id = o.object_id
	INNER JOIN sys.dm_db_partition_stats AS ddps
		ON i.object_id = ddps.object_id
		AND i.index_id = ddps.index_id
	INNER JOIN sys.partition_schemes ps
		ON i.data_space_id = ps.data_space_id
	INNER JOIN sys.partition_range_values prv
		ON ps.function_id = prv.function_id
		AND ddps.partition_number = prv.boundary_id
WHERE
	i.index_id < 2
	AND o.is_ms_shipped = 0
	AND o.name LIKE '%TABLENAME%'
ORDER BY
	SchemaName,
	TableName,
	PartitionNumber;

This query has a couple of good uses.  The first is that you can monitor a backfill with it.  In my scenario, I’m loading several years worth of data into a new fact table.  SSIS is loading data one day at a time so that I don’t leave a connection open so long that it gets killed.  Running this query shows me approximately how far along I am, as I know the valid date range for data in this table and I can watch the partitions as they fill up.

Another use for this is to check to make sure that your partitions are not skewed too heavily.  For example, my partitions are pretty tightly defined:  they are all (except for the current partition) within a 15-20% of average, meaning that they’re pretty close to a uniform distribution.  That really helps in cases in which I can eliminate partitions, as this means that when the optimizer can throw away 47 out of 48 partitions, it can throw away about 47/48ths of the data, regardless of which partition remains.  I would consider that a good partition.  At the opposite extreme, where almost all data is stored in a single partition, you get zero benefit from partitioning but still have to deal with the DBA overhead involved, and that table might be a candidate for re-thinking a partitioning strategy or eliminating partitioning altogether.

Thoughts On In-Memory OLTP

A Steve Jones editorial from a few weeks ago got me thinking about In-Memory OLTP.  We implemented this in our data warehouse about a month ago to mixed success.  Here are my notes from that forum post as well as a couple of areas in which I figured I could expand upon my comments there.

We use memory-optimized objects in two places: one staging table to load data warehouse fact tables, and various memory-optimized table-valued parameters as part of warehouse loading (streaming certain dimension rows from .NET code into our warehouse).

My original plan was to use memory-optimized queue tables in our OLTP system to feed memory-optimized TVPs into memory-optimized staging tables, and possibly convert a couple of our more heavily-read OLTP tables to become memory-optimized in the process. That fell apart for a few reasons.

The best (i.e., most nefarious) problem that we’ve had with In-Memory OLTP is XTP_CHECKPOINT issues. This happened even before we had a single memory-optimized object; we just had the filegroup. The first time this happened in production, we suffered a 10-hour downtime on a warehouse node and had to re-queue a lot of data because the transaction log had filled up and we couldn’t do a thing due to this XTP_CHECKPOINT.  We ended up recovering to shortly before the database failure and didn’t lose any data, but needless to say, causing a database failure before even a single memory-optimized object was created was…problematic…  We opened a support case with Microsoft and that case is still open.  As best as we can tell, if the transaction log fills up and autogrows at any point in time, the XTP checkpoint mechanism stops and never recovers, meaning that the In-Memory checkpoint mechanism never actually checkpoints and thus your log continues to fill up until you take the database offline and force it to reconcile the log.  Our operations team has been on the lookout for these situations since then and have prevented another disaster, but it’s one more thing that the team needs to consider,

Aside from that, the hard limitation on ~8000 bytes per row and no LOB meant that some of the places where I might have wanted to use a memory-optimized TVP (like TVPs for some of the larger dimensions) wouldn’t work out.

We ended up not pushing the memory-optimized filegroup out to production on our OLTP instances because you cannot take a database snapshot if the database has a memory-optimized filegroup, and we use database snapshots for some system processes. On the OLAP side, we did implement memory-optimized objects, and saw significant performance improvements. The TVPs were significantly faster than their on-disk counterparts and our staging table–which previously had been one of the top sources of deadlocks–zooms. Unfortunately, even with these performance benefits, the limitations have been too strict for adoption across our platform. Even if I got the go-ahead to move memory-optimized objects to our production OLTP system (and not just our OLAP system), there would have been another problem preventing general adoption: the inability, within an explicit transaction, to use memory-optimized tables as well as tables in external databases on the same instance. We use replication to push static data around in our environment, and all of that static data goes into one database on each instance. Many of the queries whose performance I wanted to improve join to this static data database, and I did not want to put all of the in-memory work in a separate transaction from the static data work.

I’m still hopeful that 2016 will solve at least some of these problems and make memory-optimized objects practicable, but to be honest, even coming in knowing most of the code-writing limitations of memory-optimized objects and natively compiled stored procedures (of which we have zero), I think In-Memory OLTP really burned me, and even if 2016’s version is a lot better, I’d have to think long and hard before building a project around it.

Upcoming Events

Over the next six weeks, I will be speaking four times.  All of these talks are free and open to the public—although SQL Saturdays do require payment for lunch if you choose.

TriNUG Data SIG

On Wednesday, June 17th, I will speak at the TriNUG Data SIG (link forthcoming).  My topic is Working Effectively with Legacy SQL.

SQL Saturday Chattanooga

I will speak at SQL Saturday Chattanooga on Saturday, June 27th.  My session begins at 4:15 PM, and the topic is Much Ado About Hadoop.

SQL Saturday Columbus

I will speak at SQL Saturday Columbus on Saturday, July 11th.  My session begins at 1 PM and I’m also giving my Hadoop talk there.

TriNUG F#/Analytics Group

On Tuesday, July 28th, I will give a modified version of my Hadoop talk to the F#/Analytics group.

Why We Can’t Have Good Things: In-Memory OLTP

I’ve just recently implemented the first memory-optimized tables at our company.  My original plan was to create a few memory-optimized staging tables on our transactional systems, one memory-optimized staging table on the warehouse side, and several memory-optimized TVPs to help with shuttling data around.

This plan quickly fell apart.  Here’s a quick summary of the problems I ran into during the sprint.  I knew about many of them, but there were a couple fascinating new issues I had not yet seen.

  1. Memory-optimized tables are limited to 8000 bytes per row.  Some of my dimensions have LOB data (a memory-optimized no-no) and other things which are not allowed.  That meant that a couple of my TVPs were going to stay on-disk.
  2. I created my memory-optimized queue tables for staging, and we use triggers to load these queue tables.  This isn’t a great thing overall—I’m eventually going to replace the triggers with Service Broker messages—but it’s what I have to deal with.  Anyhow, I ran into a very interesting error message in stored procedures which eventually call triggers which insert into the memory-optimized tables:  “A user transaction that accesses memory optimized tables or natively compiled procedures cannot access more than one user database or databases model and msdb, and it cannot write to master.”  I got this message because our transactional procedures often join to a replicated database to get static data that we need, and that causes an error.  The way around this problem is to use separate transactions, but I don’t really want to do this.  This ruined my using memory-optimized tables on the OLTP systems.
  3. On one of our instances with a very small, fixed transaction log, we found a very interesting problem:  the transaction log was not truncating due to XTP_CHECKPOINT and because it was full, we could not make any alterations due to the log being full.  Oh, “any alterations” includes expanding the log size…  There is an open Connect item which basically says that the message may or may not be a lie.  One person mentioned that taking a full backup fixed the problem, but in our case, we had to restart the database engine service and re-size the log.
  4. As a side note, you cannot take database snapshots if you have a memory-optimized filegroup.  You also cannot drop the memory-optimized filegroup unless you drop the entire database.  This means that, in practice, you cannot get rid of a memory-optimized filegroup once you create it, and once you create it, you’ve limited your available DBA options.

All of this has made In-Memory OLTP in SQL Server 2014 less than ideal for a real production environment.  I’m hoping that 2016 has significant enough improvements that we can expand our use of In-Memory OLTP, but for now, our initial scenario has collapsed down to a few memory-optimized TVPs and one staging table on our warehouse instances.