T-SQL Anti-Patterns: Linked Server Joins

Yesterday’s anti-pattern, non-SARGable ORs in joins, was all about writing code which the SQL Server database engine can optimize.  Today’s anti-pattern is in that vein, but this time we’re looking at cross-server queries.

Linked servers are a powerful feature within SQL Server, and I don’t want to throw them out entirely.  What I do want to do, however, is limit how often they are used in regular, business-critical code.  The reason is that, prior to SQL Server 2012 SP1, table statistics were not available to linked server users without sysadmin, db_owner, or db_ddladmin rights.  The number of cases in which I would want a linked server role with sysadmin, db_owner, or db_ddladmin rights is very small, and I would definitely not want to give those rights to a web application or anything else user-facing.  Without those table statistics, SQL Server would make poor choices with linked server queries, and so joins against remote tables would get very expensive.

Even with those statistics, cross-server joins are going to be expensive.  The reason is that in order to join row sets from two servers, you need to pass all of one server’s row set over to the other server, perform the join operation, and then (potentially) pass the full result set back to the first server.  If you’re joining two tables with a billion rows, that could be extremely expensive, even if you have great indexes on both sides.

So how do we get around this?  Here are a few potential methods:

  1. Get rid of the linked servers altogether and use another method, such as replication or Always-On Availability Groups, to propagate the data you need from your remote server.  This makes a lot of sense if you’re looking at relatively static lookup data.
  2. For truly static data, you might keep non-replicated copies on both servers.  Replication can be a pain to troubleshoot and manage and AGs require 2012 Enterprise Edition so they can be very expensive.  Thus, if the data never changes or only changes once a year, it might make more sense to create the yearly change script and execute it on multiple servers.
  3. If you absolutely need the remote data, try to re-write queries to avoid joining local tables to remote tables.  Create a temporary table on your local server, pull in as few rows from the remote server as possible, and then join your local tables to the temp table you created.  You still take the linked server hit, but there’s a much smaller chance of that query blowing up.