The Marginal DBA

You Have A Performance Problem; What Do You Do?

Brent Ozar had a blog post about what hardware you can purchase with the price of two cores of Enterprise Edition and argues that you should probably spend some more money on hardware.  Gianluca Sartori has a blog post along similar lines.  By contrast, Andre Kamman had an entire SQLBits talk about not throwing hardware at the problem.  And finally, Jen McCown argues that you should spend the money on Enterprise Edition.  When you have a performance problem, who’s right?

The answer is, all of them.  And to head off any comments, my first paragraph is mostly a setup; ask any of these authors and I’m sure they’ll tell you that “it depends” is the right answer.  The purpose of this post is to dig a bit deeper and discuss when each of these points is the correct answer, looking at things from an economist’s perspective.

Detour Into Basic Economics

Before I get too deep into application of economic principles to the world of database administration, let’s cover these principles really quick so we’re all on the same page.

Scarcity

Your data center is only so big.  Your rack can only hold so many servers.  Your credit card has a limit, and so does your bank account.  At some point, your budget runs out.  And even if your budget is phenomenal, there are technological limits to how big a server can be, or even how big a cluster of servers can be.  No matter how much cash, how many top-tier experts, and how much time you have available, there is always a limit.

I’ve personally never had to worry about hitting the 128 TB RAM limit in Windows Server 2012 R2.  Even if you had a server capable of using 128 TB of RAM, I’m sure it’d be so expensive that there’s no way I’d ever be allowed near it.

This, in short, is scarcity:  you will always run out of resources before you run out of problems.

Opportunity Cost

In a world of scarcity, we can figure out the true cost of something:  what is the next-best alternative to this?  For example, let’s say we have the option to buy $100K worth of hardware.  What other alternatives exist?  We could bring in a consultant and pay that person $100K to tune queries.  We could send the current staff out for $100K worth of training.  We could hire a new person at $100K (although this is a stream of income rather than a one-time payment, so it’s a bit trickier of an answer).  Or we could buy $100K worth of software to help solve our problem.  Dollar values here simply help focus the mind; even without currencies, we can still understand the basic concept:  the actual cost of a good or service is what you forego when you decide to make the trade to obtain that good or service.

Understanding opportunity cost is critical to making good decisions.  It also helps lead into the third major topic today.

Marginal Utility

One of the toughest problems of pre-Marginal Revolution economics was the diamond paradox.  The short version of the diamond paradox is as follows:  water is more valuable than diamonds, in the sense that a person can live with water but no diamonds, but not vice versa.  So why is water so inexpensive, yet diamonds are so expensive?

The answer is in marginal utility.  The idea behind marginal utility is that your valuation of a thing changes as you obtain more of that thing.  The first glass of water for a thirsty man in the desert is worth quite a bit; the 900th glass of water, not so much.  In hardware terms, going from 8 GB of RAM to 16 GB of RAM is worth more to you than going from 192 GB to 200 GB.  The 8-16 jump allows you to do quite a bit that you might not have been able to do before; the jump from 192 to 200, even though it is the same total RAM difference, opens many fewer doors.

Detour’s Over; Back To The Main Point

Armed with these principles of economics, let’s dive into the problem.  To keep things simple, I’m going to think about three competing areas for our IT budgets:

  1. Purchasing newer, better, or more hardware
  2. Paying people to write better code or tune application performance
  3. Purchasing Enterprise Edition licenses rather than Standard Edition (or APS appliances rather than Enterprise Edition)

Applying the scarcity principle first, we know we can’t afford all of these things.  If your company can afford all of the items on the list, then this whole topic’s moot.  But honestly, I’ve never seen a company whose budget was so unlimited that they could keep hiring more and more people to fix code while buying more and more hardware and getting more and more expensive software.  At some point, you hit a limit and need to make the hard decision.

Here’s a simplistic interpretation of those limits:

LinearConstraints1.png

In this example, I have a set of hardware, queries, and SQL Server versions that I allocate to OLTP processing and OLAP processing.  I have three constraints (which I’ve made linear for the sake of simplicity):  hardware, application code, and SQL Server versions.  The idea behind this two-dimensional linear constraint picture is that you are able to pick any point on the X-Y plane which is less than or equal to ALL constraints.  In other words, you can pick any spot in the purple-lined section of the image.

Taking this scenario, the application is tuned reasonably well and the version of SQL Server isn’t going to help us much; we simply don’t have powerful enough hardware.  This might be something like trying to run a production SQL Server instance on 8 GB of RAM, or running on 5400 RPM hard drives.  Even if we hire people to tune queries like mad and push the red boundary out further, it doesn’t matter:  we will still have the same bottlenecks.

By contrast, in this case, if we purchase new hardware, we can shift the purple curve out.  Let’s say we get some fancy new hardware which solves the bottleneck.  Now our constraint problem might look something like the following:

LinearConstraints2.png

We still need to pick a spot somewhere in the purple-lined section, but notice that our constraint is no longer hardware.  In fact, we have two potential constraints:  version limitations and application limitations.  The answer to “what do we need to do?” just got a bit more difficult.  If we are in a situation in which we lean heavily toward OLTP activity, our next bottleneck is the application:  now it’s time to rewrite queries to perform a bit better.  By contrast, if we’re in a big warehouse environment, we will want to upgrade our servers to take advantage of OLAP features we can’t get in Standard Edition (e.g., clustered columnstore indexes or partitioning).

Can We Make This Practical?

In the real world, you aren’t going to create a linear programming problem with defined constraints…probably.  Nevertheless, the important part of the above section is the principle:  solve your tightest constraint.

The Easy Stuff

Here’s a bit of low-hanging fruit:

  1. If you’re running production SQL Server instances with less than 32 GB of RAM, you almost certainly have a hardware constraint.  Buy that RAM.  Going back to Brent’s post, you can buy a brand new server for $7K, but if you have a server that can hold at least 128 GB of RAM, it looks like you can buy that RAM for about a grand (and I’m guessing you can probably get it for less; that’s retail price).
  2. If you haven’t done any query tuning, you can probably find two or three costly queries and tune them easily.  They may not be the biggest, nastiest queries, but they help you reduce server load.  Similarly, find two or three missing indexes and add them.
  3. If you query relatively small sets of static data, put up a Redis server and cache those results.  A client of mine did that and went from about 85 million SQL Server queries per day to 60,000.  Even if your queries are perfectly optimized, that’s an incredible difference.

The nice thing about low-hanging fruit is that the opportunity cost is extremely low.  Spend a grand for RAM, 3-5 hours tuning queries, and a few hundred dollars a month on Premium Redis caching.  For a company of any reasonable size, those are rounding errors.

The Hard Stuff

Once you get beyond the low-hanging fruit, you start to see scarcity creeping back in.  You’ve got reasonable hardware specs, but maybe that Pure Storage array will help out…or maybe you need to bring in top-notch consultants to optimize queries in a way you could never do…or maybe it’s time to buy Enterprise Edition and take advantage of those sweet, sweet features like In-Memory OLTP.

Finding the right answer here is a lot harder of a problem.  One place to start looking is wait stats.  SQL Server keeps track of reasons why queries are waiting, be they hardware-related (disk, IO, CPU), scheduling related, network bandwidth related, or even because your application can’t handle how fast SQL Server’s pushing data across to it.  There’s a bit of interpretation involved with understanding wait stats, but Paul Randal is putting together the definitive resource for understanding wait stats.

The rest of my advice comes down to experience and testing.  If you’re in an environment in which you can run tests on hardware before purchase, that’s an easy way to tell if hardware is your bottleneck.  One scenario I had a few years back involved a case in which we got to play with a solid state disk array back when those were brand new.  We ran our production SQL Server instances on them for a week or so and saw almost no performance gain.  The reason was that our servers had 8 GB of RAM and 1-2 VCPUs, so faster disk simply exacerbated CPU and memory issues.  By contrast, at my current company, we had a chance to play with some newer servers with more powerful CPUs than what we have in production and saw a nice performance boost because we’re CPU-limited.

The nice thing is that my graphics above aren’t quite accurate in one big sense:  application tweaks and hardware work in parallel, meaning that buying new hardware can also push out the application tweak curve a bit and vice versa.  This means that, in many cases, it’s better to find the cheaper answer, be that hardware or sinking hours into application development and tuning.

Emphasizing Sunk Costs

There’s one last thing to discuss here:  the concept of sunk costs.  If you want a long-winded, rambling discussion of sunk costs, here’s one I wrote while in grad school.  For the short version, “sunk costs are sunk.”

Why is it that a company is willing to pay you $X a year to tune queries but isn’t willing to pay 0.1 * $X to get hardware that would end up solving the constraint issue above so much more easily?  The answer is that, from an accounting standpoint, the new hardware is a new cost, whereas your salary is already factored into their budget.  From the standpoint of the in-house accountant, your salary is a fixed cost (which they have to pay regardless of whether you’re spending 80 hours a week tuning queries like a boss or 5 hours a week of work & 35 hours of Facebook + Pokemon Go…at least until they fire you…) and from the accountant’s perspective, your effort doesn’t get shifted around; you’re still a DBA or database developer or whatever it is you do.  So your salary is a sunk cost.

I should mention a huge, huge point here:  salaries are sunk costs in the short term.  In the long term, salaries are neither fixed nor sunk costs.  In the long term, your manager can re-assign you to a different department (meaning that your salary is not sunk), and your salary can change or even go away if it makes sense from the company’s perspective (meaning that salary is a variable cost rather than a fixed cost).  But for our purposes, we’re interested in the short-term ramifications here, as “long-term” could mean a year or even several years.

Also, there is one big exception to the rule that your salary is a sunk cost:  if you get paid hourly, your wages become a variable cost, and it’s a lot easier to sell the company on trading variable costs for fixed costs:  “If you buy this hardware for $X, you don’t have to pay me 3 * $X to tune queries.”  That’s an actual cost reduction, whereas in the salary case, you’re still arguing for a net increase in company cost incurred.

So what can you do in that case?  Selling the company on “reduced cost” doesn’t cut it so much for salaried employees because you generally aren’t really reducing costs from the company’s standpoint.  Instead, you have to sell on opportunity cost:  if you weren’t spending 15 hours a week trying to tune queries to get this ETL process to work adequately, you could focus on Project X, which could net the company $Y in revenue (where $Y is an expected revenue return conditional upon the probability of success of Project X).  If $Y is substantially higher than the cost of hardware, you now have a solid business case you can take to higher-ups to get newer, better hardware.

Similarly, if you’re spending all of your time on application development and the company’s throwing barrels of cash at vendors for new hardware, you could make the case that getting some time to tune queries might allow the company to save money on net by deferring the cost of hardware purchases.  This is an easier case to make because, again, your salary is a sunk cost (unless you’re a wage-earning employee), so the opportunity cost comes down to what the next-best alternative is with your time.

Conclusion

For the most part, I can’t tell you whether you’re better off buying more hardware, tuning those queries, upgrading to Enterprise Edition, or doing something entirely different (at least unless you’re willing to shovel cash in my direction and have me take a look…).  What I hoped to do in this blog post was to give you some basic economics tools, letting you apply them to your own situation.  With these concepts in place, they’ll give you ammunition when going to management to ease your biggest constraint.  Managers and accountants are going to be a bit more amenable to arguments around opportunity cost, and your pleas are less likely to fall on deaf ears when you realize that sunk costs shouldn’t affect future behavior.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s