How To Troubleshoot Performance Problems

Not too long ago, I had an interesting discussion about how to troubleshoot performance problems in large procedures.  By “large procedure,” I’m talking about something which can be thousands of lines of code, or a procedure which might have dozens of potential choke points.  This is not an exhaustive article, but instead gives an idea of thought processes and acts as a starting point.

The first thing I want to do is format the code.  I’m going to use a tool to format code; that way, I know that I will be able to understand the code more quickly and be less likely to make a mistake due to improper tabbing, multiple columns on a single line, etc.  This obviously won’t improve any performance problems and I might need to maintain historical formatting when I make my changes, but I’ll at least make the changes locally for me to understand.

Once I have formatted code, the next step is to do a quick code review and look for certain patterns.  Any cursors or WHILE loops I see become automatically suspect.  I’m also on the lookout for Common Table Expression chains (especially ones which aggregate data), temp table chains (taking data from one temp table into another into another, etc.), and user-defined function calls.  Looking for cursors and WHILE loops is obvious:  procedural code tends to perform poorly in SQL Server, so those sections of code can act as performance choke points.  Common Table Expression chains can lead to lazy spooling, in which we read a subset of data over and over, and that subset of data can end up in tempdb, which is the single biggest point of contention on a typical SQL Server.  Temp table chains follow a similar logic:  we’re using tempdb over and over here, which might lead to a contention problem, or possibly a problem in which we are not filtering the base data sets well enough.  Finally, I look at user-defined functions because they tend to perform significantly less efficiently than equivalent queries without functions.  These are the types of things I’m looking for when I do my initial review, and when I find them, I mark their locations.

Next, I want to take a look at an execution plan.  Ideally, I want to use SQL Sentry Plan Explorer to run the query and collect all of the relevant plan metrics, including CPU and I/O per operation.  Because this is a very large procedure, I’d expect there to be a fairly large number of independent operations, and I want to focus my efforts on the highest-cost offenders.  Once I narrow down to the one or two calls which are most expensive, I’m going to look for a few specific things first.  First of all, I want to look for lazy spooling, as that tends to be a performance killer.  When I see lazy spooling, I almost always want to re-write the relevant query in such a way as to remove the spooling.  I also want to look for key lookups, which tell me that indexing might need to be updated to handle this query.  That could mean one of three things:  adding a new index, modifying an existing index, or modifying the query to use an existing index (for example, if I know a query should use a particular filtered index but it’s using a different index which doesn’t have all relevant columns).  Key lookups don’t hurt very much when we’re only looking at a few rows in an infrequently-called procedure, but high volume or high frequency can easily harm an otherwise-acceptable procedure.  I also want to see if I can find index scans where I expect seeks—or the other way around.  If I know that I am retrieving less than 0.5% of the data in a non-trivial table, I want a seek.  If I know I’m retrieving more than 20% of the data, I want a scan.  If I’m seeing the wrong type of operation, that could cause performance to suffer.

The next step for execution plans is looking at join types.  If I see a scenario in which I expect to pull in nested loops but am getting a hash match, I might end up spilling over to tempdb, which harms performance.  I also want to look for sort operations, as those tend to be relatively slow to begin with, and they get much worse if they spill to tempdb.  Really, tempdb is a common performance killer, so it’s not surprising to see some of the most expensive, performance-killing operations involve tempdb in some fashion.

After looking at the execution plan, I want to go back to my initial code walkthrough because execution plans can sometimes lie.  Execution plans lie most often when dealing with cursors and functions, because they tend to show you one call rather than all calls, meaning that they severely under-count how much CPU and I/O function calls and cursors/WHILE loops really use.  This means that I want to look at ways of removing those cursors and functions, even if the execution plan tells me that they aren’t very expensive.  If performance doesn’t change, I can always put them back in where they were, but inlining a table-valued function can improve performance of a query by a factor of 3-9x and could potentially improve scalar UDFs by much more (if they’re in the WHERE clause and make separate I/O operations like querying from some other table).

The easiest way to make these changes is to simplify the procedure for testing purposes.  That means hardcoding paramters or incoming data sets and stubbing out child procedure calls.  Obviously this technique can be somewhat limiting, but it does allow you to focus in on the critical section of code without needing to immerse yourself in other points of complexity.  Take care of one critical section at a time, create unit tests to ensure that refactoring doesn’t break existing calls, and focus on the pain points.  That’s the method I use to tackle a very large problem.

What do you do?  I’m interested in hearing how other peoples’ methods may differ from mine, as I consider this a hard problem and am always up for learning more.

Buffalo At 2-1

With a satisfying demolishing of the Miami Dolphins, the Bills are now at 2-1.  Here are a few thoughts on the team thus far:
– Tyrod Taylor has been as good as the Bills could hope. He did well enough in week one, came back from a disappointing start to end up with a respectable but not great game against the Patriots, and then dominated the Dolphins. I’m not jumping to conclusions with him, but his start is definitely exciting for Bills fans, and good enough for the Bills brass to trade away Matt Cassel.
– The defense has been very good. They suffered against the Patriots, but to be honest, but Tom Brady hit iddqd before the season began. They have shut down Andrew Luck and Ryan Tannehill.
– Karlos Williams is out-performing LeSean McCoy. This is partially because Williams is doing pretty well, but McCoy has struggled. There are rumors that McCoy has a hamstring problem, but whatever it is, he’s not moving the ball like he needs to.
– Charles Clay has 2 TDs, but he has not been the impact player you want given his salary. Percy Harvin, however, has been fantastic in his role.

The Bills could realistically come in at 5-3 or maybe 6-2 when they get to their bye week. The chances of them winning the division is minimal, but if Taylor continues to throw like he has, the Bills have a legitimate shot at making the playoffs for the first time this millennium.

Get To The Greenways

This year, I’ve been doing a lot of rides with Cycling for Health, a Meetup group based out of Raleigh.  Most of our rides are on the greenways in Wake County.  I wanted to write this post to spread the word ever so slightly and get more people out walking, jogging, and cycling on these greenways, as they are one of Raleigh’s best-kept secrets.  The trail map is relatively easy to find, but it can be overwhelming, so let me give you three of my favorite rides.  I’m not a fantastic cyclist by any means, so none of these are extremely hard.

  1. The Neuse River trail runs from Falls Lake Dam (point 1 in the map) down to Clayton (down by point 26 in the bottom-right corner).  It is 34 miles end-to-end, meaning that you can get nearly 70 miles of cycling in with just one round trip.  It’s also about as flat as a Raleigh trail is going to get, with just a few minor hills here and there.  We love to start at Anderson Point park (spot 41) and pick a direction.  Because Anderson Point is right in the middle, that means you can get a 30-35 mile trip, great for a relaxing Sunday.
  2. The Walnut Creek trail starts at Anderson Point (41) and goes west.  This trail gets a little hillier here and there, and there’s a fantastic stopping point in the middle at the Walnut Creek Wetland Center (74).  You can continue on the Walnut Creek trail out to Lake Raleigh (spot 12) and then Lake Johnson (14).  The downside to Lake Johnson is that it’s a pretty crowded area, so if you’re cycling, you’ll have to go slowly and watch out for people.
  3. Bent Creek has a semi-hidden trail head (spot 11).  They have street parking, so spots can be limited, but the ride is great.  Within a couple of miles, you’re down at Shelley Lake, and then on the Mine Creek Trail.  From there, you can take the House Creek trail (which is very hilly) or Crabtree Creek.

You could also cycle on country roads, but honestly, I try to avoid that as much as possible.  Drivers in this area are terrible and unless you’re riding someplace practically deserted or in downtown (where the speed limits are 25), it’s just not worth the risk.  If you’re in the area and haven’t been on the greenways before, get to it!  If it helps, join up with a group (like CFH) and learn the trails that way.  There are other trails in Durham, Cary, and outlying areas, but I’m most familiar with the Raleigh set.

Next Year’s Talks

This year, I’ve focused on two talks:  one of the APPLY operator and one introducing Hadoop.  I had a chance to give my Working Effectively with Legacy SQL talk as well, but only once.  I’d love to give my Legacy SQL talk more frequently, but here are my plans:

New Abstracts

I have a few talks sketched out, and some of them build on my prior talks.  I’m pretty excited about all three of them and hope I’ll be able to give all of them next year.

  • Data Migration Using BIML — Many companies scale out their databases horizontally, partitioning data by customer (or some other identifier) into separate databases or instances.  Sometimes, it becomes important to move that data from one database to another.  SQL Server Integration Services is a fantastic tool for ETL, but creating and maintaining dozens or hundreds of data flows by hand is exhaustingly tedious.  Instead of doing it the hard way, use the BI Markup Language (BIML) to automate package development.  This talk will show how we can use metadata tables and BIML to auto-create SSIS packages to migrate data from one database to another, handling foreign key constraints along the way.
  • Peanut Butter & Chocolate:  Integrating Hadoop with SQL Server — So you jumped on the bandwagon and set up a Hadoop cluster…but now what?  Your database developers and app developers know how to integrate with and develop against SQL Server, but the Hadoop world is a completely different experience.  This talk will help bridge the gap between SQL Server and Hadoop, using tools such as SQL Server Integration Services and Sqoop to migrate data between a Hadoop cluster and a SQL Server instance, as well as PolyBase to integrate the two like never before.
  • Big Data, Small Data, and Everything In Between — The breadth of options available to data professionals today is staggering:  between relational, object, graph, and document databases, we have a wide array of storage options, as well as a huge number of access and analysis tools.  This talk will walk you through data platform, data analysis, and data processing options.  We’ll discuss when to use (and when not to use) Spark, R, Storm, and SSAS, as well as a range of other tools and technologies.  By the end of this talk, you may not have MongoDB or Neo4J, but you will know where they might fit.

Cleaning Up Old Talks

In addition to creating new talks, I want to clean up my Tally Tables and In-Memory OLTP talks a bit.  I love Tally Tables, but I think I need to come up with a larger variety of problems and tighten up the discussion; the last time I gave this talk, I was not at all happy with it.

As for In-Memory OLTP, it’s my experience that very few people actually use it (and I’d love to talk to some people at PASS Summit to see if my experience is the same as that of others).  At my company, we do use it and have encountered a couple of major bugs in the product, so I want to include those in my talk, as well as spending some time making sure I really explain the pros and cons as well as I can.  SQL Server 2016 will introduce new features in In-Memory OLTP, so I’ll update the talk to include those as well.

Upcoming Events And Current Happenings

I haven’t posted lately to this blog.  I am remiss in not doing so, but I have been looking at a jam-packed schedule lately.  Here’s a quick summary of big events.

Speaking and Conferences

Right now, I’m in Louisville, Kentucky for Derbycon.  This is year #5 of the event, and my 4th appearance.  Right after that, I’m going to be in Pittsburgh, Pennsylvania on October 3rd for SQL Saturday Pittsburgh.  At that event, I’ll give my talk on the APPLY operator bright and early.  The week after that, I’m going to host SQL Saturday Raleigh.  I won’t present there, but I’ll be running around all day, as I’m the volunteer coordinator and the event coordinator.  Then, the following week, I’m going to SQL Saturday Charlotte.  This time, I’m giving my Hadoop talk, starting at 11:30 AM.

That will be my last SQL Saturday until at least December, but October has a couple of big events left in store for me.  On Monday, October 26th, I’m going to be in Seattle, Washington for FreeCon, put on by Brent Ozar and Kendra Little.  I’m seriously excited about this, as it’s a small group of really passionate data people getting together and talking about ways to do what we do even better.  After that, I’m spending the rest of the week at PASS Summit, which is spring break for database people.

That’s my last event for a while, but it’ll probably take me a few weeks to recover.

SQL Saturday Raleigh

As I mentioned, I’m heading up SQL Saturday Raleigh.  I’m fortunate to have a great team of people working with me, but this is a huge time sink.  I knew it was a lot of time last year, but this year, I have so many other things going on that it’s hard to juggle everything.  Nevertheless, I think we’re going to put on a great event and I don’t plan to lose any hair because of it…

TriPASS

The next bit of big news is that I’m now the chapter leader for TriPASS, the Raleigh-Durham chapter of PASS.  I’m going to be filling some big, big shoes and I know it’s a long-term commitment.  When I started speaking to groups, I decided that I wanted to do as much as I could for the community, and I think that this is a good way of doing that.

If you happen to be in the Raleigh-Durham area on the third Tuesday of the month, we’d love to have you out.  We’re also looking for speakers.

Consulting

I’ve picked up my first consulting client.  When I started Catallaxy Services, I wanted to grow the company into a full-time consulting company focusing on small businesses.  At this point, I’m not positive that this is the path I really want to take, but I will admit that it’s fun to look at a new code base and tackle some new problems every once in a while.

If you’re interested in talking about some light consulting work, my details are on the company site.

The triumphant return (?) to UD, or Ten Years Later

While Kevin is in Louisville, I have been in Dayton. I was invited to give a lecture on Russia’s fleet performance in World War I, and I enjoyed the experience. I also got to stick around for the Phi Alpha Theta induction; it was gratifying to see they had a ceremony and food and everything (but mostly food). Campus has changed a lot — it’s much greener, with some ugly buildings being removed. Dayton has assimilated the NCR Building into the university as well (which is for National Cash Register, not the New California Republic). I got to stay at a hotel for free and even had a patio that I never used.

Tomorrow (that is, Friday), I’m driving up to Columbus via rental to visit with a friend of mine from high school, then it’s off to visit other family and friends in the Cleveland area before I return on Monday.

History rarely pays well, but sometimes the perks ain’t bad.

At Derbycon

Today marks the end of my Derbycon training.  This year, I ended up taking the basic Android hacking training.  I learned a good bit about the Android permissions model and we started to look into reverse engineering code from APK files.

Tomorrow will be the first day of talks.  Derbycon is a great conference, and one of the reasons it’s so great is that they pack in 10-12 hours of training each day.  Tomorrow starts at 8:30 AM and Irongeek will even be live streaming the opening sessions.  After that, sessions go on until 8 PM, and Saturday’s basically the same.

Clarification on stats in Madden NFL 16 and one possible bug

One thing that has not, sadly, changed from 15 is that it’s practically impossible to get stats from your defensive players commensurate with the top players in the NFL. While the offensive stats are realistic, the defensive ones are not — specifically things like number of tackles. The reason? The stats are generated based on 15 minute quarters with no accelerated clock. Thus, the #1 tackling linebacker in the NFL has twice as many downs played as Karlos Dansby. The fact my offense is pretty good at sucking up the clock probably doesn’t help either. Dansby should lead the NFL in tackles for loss with 15, over double those of J.J. Watt, but sadly the game does not recognize Dansby. It won’t hurt your team in a significant way, of course. They’ll still develop, but they probably won’t win many awards. I don’t know if EA will ever introduce proper stat scaling; it strikes me as fairly easy to fix, but what do I know? I’m no programmer.

There is one somewhat irritating bug I have discovered: 99% of players categorically refuse to re-sign during the season, no matter how much you offer them. I offered a backup guard three times what he was asking (which was 1.5 million a year for three years, but hey, he wanted the minimum…) and he refused. Some research on the interwebz suggests that in the offseason, players are more reasonable and will sign as they are supposed to. One thing I do find troubling is that Tashaun Gipson won’t even talk to me at this point in the season, although his contract is up and as a 91 free safety, I would dearly like to keep him in Cleveland. He doesn’t appear in my re-sign list at all.

I will keep you, loyal readers, apprised as I play through the season. I am off to a 5-0 start, but I face the Broncos next, the first truly elite offense I’ve run into.

Scouting in Madden 16

As we all know, EA has a compulsion to tweak scouting every season. This year, it’s finally solid, I think. It’s simple: you get 175 points every week, starting in Week 2 (maybe Week 3). For a total of 30 points (5/10/15) you unlock their top three skills. Once you get those, the game tells you the round his talent suggests he should be drafted in. It does not include physical skills — you get reports on that in the combine at the end of the season. So, for example, I’m looking at tight ends in the draft. I found a fourth rounder who fits my scheme and has B or B+ in catching, catch in traffic, spectacular catch. The game regraded him (but only for me) saying that he is a third round talent.

I’ll have more when I finish my first season.

One other thought: I think this is the year All-Pro is finally too hard for me (or, perhaps, too hard for the Browns). I won my first game on All-Pro, but it was ugly (lots of turnovers). I dropped down to pro and, while they’ve all been competitive, the turnovers are much lower and it’s far less frustrating. I don’t even want to think about All-Madden.

One other other thought: The stats are much more realistic, on the whole, in Madden than they have been in years past. I think they are finally scaling the season to your chosen time scale (i.e. 10 minute quarters for me). No longer do they expect you to get 150 tackles with every defensive player and the head coach’s golden retriever. The goals are better as a result. Sometimes the dynamic drive system is a little annoying: “You need to get three consecutive passes to your #4 receiver or you’ll cause a singularity!” (slight exaggeration) However, I think it’s a great system, and does force me to consider my offense carefully.