An interesting question showed up on Ask SQL Server Central last week, and I wanted to blog my answer, mostly so I can show pretty pictures and get credit for actually having a DBA-related post once in a while…

The poster had a relatively simple problem: a view was returning an error which read “Invalid length parameter passed to the LEFT or SUBSTRING function” even though the operation was actually correct. Fortunately, the poster put together a nice setup, so I can steal it here and display the queries formatted as I tend to like them:

CREATE TABLE #contact (personID int, email varchar(150), type varchar(20))
CREATE TABLE #t (personID int, type varchar(20))

INSERT INTO #contact (personID, email, type) SELECT 1, 'student1@school.com', 'student'
UNION SELECT 2, 'teacher2@school.com', 'teacher'
UNION SELECT 3, '', 'teacher'
UNION SELECT 4, null, 'teacher'

INSERT INTO #t(personID, type)
SELECT 1, 'student'
UNION SELECT 2, 'teacher'
UNION SELECT 3, 'teacher'
UNION SELECT 4, 'teacher'

--This works fine....

SELECT
	*
FROM
(
	SELECT
		COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') UserName
	FROM
		#contact c
		INNER JOIN #t t ON c.personID=t.personID
	WHERE
		c.type='Student'
) subQuery
WHERE
	subQuery.UserName <> '';

--This generates an error....

SELECT
	*
FROM
(
	SELECT
		COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') UserName
	FROM
		#contact c
		INNER JOIN #t t ON c.personID=t.personID
	WHERE
		t.type='Student'
) subQuery
WHERE
	subQuery.UserName <> '';

What’s interesting here is that the two queries are basically the same. The only difference is that the first, working query filters where c.type is equal to Student, and the second, non-working query filters where t.type is equal to Student. The reason the query fails is because some of the #contact records don’t actually have @ signs in them, and so charindex returns -1 and you’re passing a -2 into the substring function, which causes you to see the error message.

But wait, our query clearly says only to get the rows that are marked as Student, so this should just be getting valid rows (i.e., rows in which there is a valid e-mail address). Thus, everything should be unicorns and pixie dust…right? Well, let’s dig into the inner workings a bit and come up with the entire answer.

Investigating the Execution Plans

Let’s generate some estimated execution plans. There are two reasons that I’m looking at an estimated plan here. The first one is that the estimated and actual plans will both contain all of the necessary information for solving this problem. More importantly, we can’t get the broken query’s actual execution plan: it fails before completion, and so no plan gets generated for us to see. Anyhow, to generate an execution plan, press Ctrl-L. This will give you an Execution plan tab with the different activities we’re performing. The important part here is the last two sets of plans, because they relate to the two queries that we’re interested in. The estimated execution plans for these two queries look the same:

Estimated query plans

This indicates that there’s nothing really wacky. What we have is a scan of the #contact table, followed by a computed column (which gives us the UserName column), a scan of #t, a hash match, and finally a SELECT to give us the result set. It’s a fairly standard query plan. But execution plans are like snitches: they aren’t forthright with information; you need to rough them up a bit to get the good stuff. So let’s do just that.

Interrogating the Execution Plans

What we’re looking for is the reason we’re getting bad rows even though our query clearly says that we want to ignore them. The obvious place to look is the #contact table scan.

A table scan on the #contact table

Let’s start with the working plan first, to see what’s right. Then we can compare that to the other plan and see where the difference lies. In SQL Server Management Studio, click on the #contact Table Scan icon and press F4 to go to the Properties window. This gives us a whole host of information. A lot of it is useful, but for this problem, we’re looking for the Predicate.

The table scan's Properties window, with special emphasis on the Predicate

Click on the predicate text to see the […] symbol. Click that button and you’ll get a window with the full predicate. It should look something like this:

#contact.[type] as [c].[type]='Student' AND CASE WHEN substring(#contact.[email] as [c].[email],(1),charindex('@',#contact.[email] as [c].[email])-(1)) IS NOT NULL THEN substring(#contact.[email] as [c].[email],(1),charindex('@',#contact.[email] as [c].[email])-(1)) ELSE '' END<>''

This looks pretty familiar. The first part of the statement is a filter on the type, showing us only students. The second part is a messier version of subQuery.UserName. We want to get the usernames which are not blank, and username is defined as

COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '')

So the predicate makes some sense here. So now let’s look at the estimated plan of our other query. Remember that the pictures and costs looked the same in SQL Server Management Studio, but that doesn’t mean that everything is the same. Once again, we’ll go to the Predicate for our #contact table.

A query with a 'bad' predicate

In this case, the predicate is different. Instead of checking where Type = “Student” and UserName is not blank, we’re only checking cases in which UserName is blank.

CASE WHEN substring(#contact.[email] as [c].[email],(1),charindex('@',#contact.[email] as [c].[email])-(1)) IS NOT NULL THEN substring(#contact.[email] as [c].[email],(1),charindex('@',#contact.[email] as [c].[email])-(1)) ELSE '' END<>''

So what happened to that check for type? Well, if you look at the second query, you can see that the check for type is now on #t. You can verify this by checking out the Predicate for the #t Table Scan:

#t's predicate has a filter on Student

From this, you can clearly see the filter on #t, winnowing down the list to students alone.

Putting It All Together

Okay, so how does that help us? Well, let’s look back at the execution plans again. In both cases, we have a hash match join between #contact and #type, as well as a filter on #contact that eliminates blank usernames. The primary difference between these two query plans is where the type filter goes: on #t or on #contact. This is vital to us because the optimizer wants us to filter out blank usernames during the scan, rather than after the join is complete. In normal circumstances, this is a great thing: any row you throw out during the scanning portion is a row that you don’t need to join or transform or perform some costly activity against. Unfortunately, in this case, the filter actually depends upon another filter being run first: the filter to get rid of contacts with the wrong type. That works out in this case with both on the same #contact table, but once you move the filter over to the other table, you no longer guarantee that the first filter runs and we get only valid rows back.

So given that, here’s a thought experiment: what if we kept the filter on #t but forced a particular join order, making sure that #t goes first? Then we could throw out all of the non-joined rows and not have to worry about our problem, right?

Well, one way to force join order is to specify a join hint. Let’s make this a HASH JOIN (to match our other queries) but hash joining #t to #contact, and see if that gets us anywhere:

SELECT
	*
FROM
(
	SELECT
		COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') UserName
	FROM
		#t t
		INNER HASH JOIN #contact c ON c.personID=t.personID
	WHERE
		t.type='Student'
) subQuery
WHERE
	subQuery.UserName <> '';

Using a join hint to force the table join order

The execution plan doesn’t look very different (we just move the Compute Scalar from before the join to after the join), and the end result is the same:

Msg 537, Level 16, State 2, Line 2
Invalid length parameter passed to the LEFT or SUBSTRING function.

So that doesn’t help at all.  How about if we force a non-clustered index on #contact and join that way?

create index ix_test on #contact(PersonId, Type, Email);

Running the breaking query again, we find out that this does fix the problem. So what changed? Well, let’s look at the execution plan:

Adding an index helped here

In this case, when we go to the Properties of the Index Seek item, we still have our Predicate column, but it has been joined by a Seek Predicates column. The seek predicate joins personIDs together from #contact and #t. What is important here is that the seek predicate gets executed first, before the normal predicate. We know this is the case because we don’t get the error. SQL Server is able to optimize the join by throwing out any rows which don’t join, and the expected gain from doing this is greater than our previous method of scanning all of the rows and then throwing them away based on the predicate.

Conclusions

In this short article, we were able to investigate an “intermittent” problem and figure out the root cause.  The ‘best’ fix in this case is to move the entire predicate to the #contact table.  That way, we prevent this problem from happening in the future if some data set makes it more valuable to read from the #contact table first and then join to #t rather than the other way around.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s