SQL Server ML Services Fills The Plan Cache

UPDATED 2018-03-13:  SQL Server 2017 CU4 fixes this issue.  See below.

We call SQL Server ML Services a lot.  As in building hundreds of thousands of times a day to build models.   It turns out that doing this has a negative effect:  ML Services plans end up staying in the plan cache and don’t get removed.  Here’s how our plan cache looks:

Plan Cache

A plan cache.  Precipitous drops are predicated by service restarts.

What happens is that things work fine for a while, until our plan cache hits about 70 GB, after which point we start getting RESOURCE_SEMAPHORE waits on some of our queries and the available space for buffer pool drops to single-digit gigabytes.

This is a problem on SQL Server 2016 and SQL Server 2017.  It’s very unlikely to affect most people, as most people don’t do crazy stuff at this scale.  But hey, what’s the fun in  having a server of my own if I can’t bring it to its knees every once in a while?

Non-Solutions

The first thing that you might do here is try to run something like DBCC FREEPROCCACHE or maybe DBCC FREESYSTEMCACHE('SQL Plans') WITH MARK_IN_USE_FOR_REMOVAL; but neither of those did anything. It appears that R/ML Services plans are not marked for removal and will not clear, no matter how many times you try to flush the cache.

Workaround

For now, the workaround I have is to restart the SQL Server service occasionally. You can see that I have done it twice in the above screenshot. Our application is resilient to short database downtimes, so this isn’t a bad workaround for us; it’s just a little bit of an annoyance.

One thing to keep in mind if you are in this scenario is that if you are running ML Services hundreds of thousands of times a day, your ExtensibilityData folders might have a lot of cruft which may prevent the Launchpad service from starting as expected. I’ve had to delete all folders in \MSSQL14.MSSQLSERVER\MSSQL\ExtensibilityData\MSSQLSERVER01 after stopping the SQL Server service and before restarting it. The Launchpad service automatically does this, but if you have a huge number of folders in there, the service can time out trying to delete all of them.  In my experience at least, the other folders didn’t have enough sub-folders inside to make it worth deleting, but that may just be an artifact of how we use ML Services.

Solution

I have worked with Microsoft on the issue and they’re going to release a patch in a future SQL Server 2017 CU to fix this issue.  I’m not sure about SQL Server 2016 and also don’t know exactly when this patch will ship, but it’s working through the pipeline and I’m happy for that.

UPDATE:  2018-03-13

Microsoft has released SQL Server 2017 CU4, which fixes this buffer pool issue.  After the patch, my SQL plan cache has not grown beyond 2 GB after 4 days, whereas prior to the patch, it’d be in the 50-60 GB range by then.

Advertisements

Quick Thoughts On 50 SQL Saturdays

This post is a bit late, as I actually blew past 50 SQL Saturdays earlier in the year, but now that the year is over, I wanted to reflect just a little bit on why I enjoy speaking at SQL Saturdays.

I’ve spoken at 57 of them so far and want to break 75 in 2018.  Here’s the year-by-year breakdown:

  • 2013:  1
  • 2014:  3
  • 2015:  9
  • 2016:  22
  • 2017:  22

I love that the institution of SQL Saturday is popular enough that someone can attend 20+ events a year all all around the world.  In my case, the vast majority of my travel is inside the United States, but I get to see parts of the country that I otherwise couldn’t (or wouldn’t think to visit), and I like that.

If you’ve thought about speaking at a SQL Saturday, I highly recommend giving it a shot.  You don’t have to be a great speaker, and definitely don’t need to be a natural speaker in order to present.  That’s exactly the situation I was in back in 2013, when I gave my first SQL Saturday presentation.  That talk had a total of four attendees and it wasn’t a polished talk at all—which is a nice way of saying that even my recollection of the talk was that it wasn’t very good…though at least I still had all 4 attendees at the end of the talk!  But like with everything else, you get better through practice and training.

You also don’t need to criss-cross the country; start with a local conference if there are any, or a nearby regional conference if you can get away with it.  If you speak at one a year, you’re still getting good experience presenting and helping share your knowledge with the community, as well as picking up additional information and potentially making great contacts and friends.

And if you happen to be in the southeast, you should submit for the Raleigh, North Carolina SQL Saturday on April 14th.

Presentation Review, 2017

It’s been another busy year for me presenting. Over the course of 2017, I gave a total of 50 talks at 42 events.  It’s been a lot of fun getting to travel around the world, hitting places as far apart as Vienna, Austria and Sydney, Australia.  I’m hoping to keep up this pace for next year as well.

Now, to look at the goals I had set for the year.  As a quick reminder of my 2017 goals:

  1. Speak at 20 SQL Saturdays and 10 user groups
  2. Speak at 2 paid conferences
  3. Give 6 webinars
  4. Do a full-length, pictures-only talk

I wasn’t quite as successful this year as I was last year.  Let’s see how I did:

Speak at 20 SQL Saturdays and 10 user groups

I ended up speaking at 22 SQL Saturdays this year (and would have been 23, had I not gotten sick the day before SQL Saturday Pittsburgh), so I beat that part of the goal.  As far as user groups go, I barely eeked out speaking at 10 distinct user groups.  So a big green checkmark for this goal.

Speak at 2 paid conferences

Another goal I ended up hitting.  I spoke at NDC Sydney in August, as well as the Mid-Atlantic SQL Server Conference and IT/Dev Connections in October.

Give 6 webinars

Missed it by that much.  I ended up doing 5 webinars this year.  I was pushing for them earlier in the year but slacked off in the middle and that made all the difference.

Do a full-length, pictures-only talk

I haven’t gotten that far yet.  I did end up doing a 10-minute talk which was dominated with pictures, but that’s not quite good enough to count.

On the plus side, I’ve been focusing on more graphics-heavy talks, so at least I’m moving toward a better equilibrium on that front.

Overall Thoughts

I know I made my 2017 goals pretty ambitious, so I’m happy that I was able to do two of them.  Doing more webinars is high on my agenda, particularly now that I have a decent recording setup.  And that will also let me create more videos of my talks.

OWASP Top 10 For 2017

I have been following the OWASP Top 10 for 2017 for a while and have decided to create a talk on the topic.  It was interesting watching the discussion on the OWASP GitHub repo, and now that the list appears to be settled, I can publish my talk.

This talk is meant to provide an overview of the OWASP Top 10 from a .NET developer’s perspective.  To support a 60-minute talk, I’m providing a large number of links to additional resources, as I know I can’t go in depth on any single topic.  This blog post will serve as my Links and Further Information for the talk, so here goes:

General Links

Injection

Authentication and Session Management

Sensitive Data Exposure

XML External Entity Injection

Building A Genetic Algorithm

This is part of a series entitled Genetics In Action.

So far, we have learned a little bit about evolutionary algorithms and taken a look at just enough high school biology to review the basics of genetics.  Today, I want to look at one particular type of evolutionary algorithm called a genetic algorithm.

Although John Holland did not invent the concept of genetic algorithms, he is the man most responsible for popularizing and developing the concept.  Holland’s Adaptation in Natural and Artificial Systems is a classic of the field and ties together the genetic metaphor.  I highly recommend this book if you’re interested in the topic.

Digging Into The Algorithm

In the last post, we looked at how the genetic metaphor ties development concepts to biological concepts, and today we will move beyond that high-level description and cover the specifics of how genetic algorithms work.  We will start by looking at the simplest type of genetic algorithm:  a single chromosome with a fixed number of genes, each of which has two alleles.  In computer science terms, think about a fixed-length array of boolean values.

GA

Two sample chromosomes

In the image above, we can see two sample chromosomes, each of which has eight genes.  We want to build up a population of these chromosomes at random—one of the interesting parts of the genetic algorithm is that it (usually) doesn’t matter where in the total search space we begin, so starting at a bunch of random points is just as good as anything else.

When it comes to choosing the size of the population, there aren’t too many hard-and-fast rules.  I have read recommendations that you should have at least 2 * N, where N is the number of genes that each chromosome has.  If you’re looking at 10-30 genes, I’ve typically had good luck with a population size of somewhere between 100 and 500.  You’ll find out that there is a maximum interesting population size, after which you don’t really get any benefit:  you won’t converge on a solution any faster, and it will take longer to do all of the processing.

Once we have our population, the next step is to score each organism in the population.  To score an organism, we apply a fitness function.  In computer science terms, this is usually an actual function, where we use the organism’s chromosomal representation as the inputs and generate and return a score for each chromosome.

Fitness

Scoring each organism

In the image above, we have defined a score for each organism.  This score is typically one of two things:  either it is the distance from an ideal point, or it is a valuation.  In the first case, think of a set of (x, y) coordinates.  We want to define a chromosome that, given an x coordinate, will generate its appropriate y coordinate.  We will calculate some distance between the predicted y and the actual y (for example, we could calculate the Root Mean Square Deviation), where a perfect match has a deviation score of 0.  On the other side, suppose that we can produce N separate products.  Each product has a cost and a price for sale.  Our genetic algorithm might describe the combination of goods we create, and the score would be the net margin (revenue minus cost) of those products.  In that case, a higher number is better for us.

It’s A Generational Thing

Now that we have the basic idea of a fitness score behind us, let’s go to the next step:  making babies.

Parents

We use a complicated matrix with dozens of measures to find the best fit

Now I am showing the entire population, which has four members.  Each member of the population has its own score, and we will use those scores to help us figure out the next generation of organisms.  The mechanism I am showing in the picture above is the simplest mechanism for genetic algorithms, which is the roulette wheel selection.  Basically, take the fitness values for each member of the population and you get a total score—that’s the 508 above.  Figure out each member’s percentage of the score, and you have a set of values which sums up to 1.  Pick a random number between 0 and 1, and wherever you land on the cumulative distribution function, you have your choice of parent.  Note that you draw with replacement, meaning that you can pull the same organism more than once.

To motivate this example, let’s suppose that red owns numbers from 0 up to .335, blue owns numbers from .335 up to .587, green owns .587 until .815, and yellow owns .815 through 1.0.  Our first random number drawn is .008, so the lucky winner is red.  Then, we draw a second parent and pulled .661, which happens to be squarely in green territory.  We now have our two parents.

Crossover

Now that we have our two parents, we are going to generate two children.  I need to introduce a new concept:  crossover.

Crossover

When a mommy chromosome loves a daddy chromosome very much…

Crossover is the recombination of a segment of a chromosome.  In the example above, we are switching genes 3-5 from each of the parents for each of the children (though child #2 is shy and is hiding off-camera).

This action is part of the genius behind genetic algorithms.  We’ve already taken some of the fittest organisms (in a large population, we’re going to pull successful organisms much more frequently than unfit organisms, and there are other pulling techniques which bias toward successful chromosomes even more than roulette wheel), and by recombining slices of their genes, we are able to test the waters with new combinations to see if we can find something even more successful.

Of course, there’s no guarantee that the new combination will be more successful than its parents were, so we have a concept known as the crossover percentage.  That is, we only perform crossover a certain percentage of the time.  In practice, this is often anywhere from 60% to 90%.  If we don’t perform crossover, then the two chromosomes just mosey on into the next step of the process.  But if we do roll the dice and land on the ol’ chop-and-swap, then we have two more RNG rounds to play.

The first of these bonus random number pulls determines where we start the chop, and the second pull determines where we stop.  In the picture above, we start at gene 3 and end at gene 5, inclusive.  In genetic algorithms, we typically have fixed-size chromosomes (though that’s not always true!) and therefore symmetrical swaps.

Turtle Power

The last step in our genetic factory floor is mutation.  One problem with survival-of-the-fittest is that, especially in later generations, we might run low on genetic diversity.  At an extreme, we end up with a population full of exactly the same chromosome, so no matter how you slice them, you get no novel patterns.  If we’ve reached the global maximum, that’s acceptable, but what if we ended up at only a local maximum and can’t jump over to the global max?  That’s where mutation comes into play.

Mutation

Not pictured:  martial artist rat or antagonistic, anthropomorphic pig & rhino combo

Mutation works by modifying a particular gene’s value.  For each gene in each new chromosome, mutate with a probability p.  Usually p is somewhere between 0.001% and 1%, though I’ve read papers that argue in certain circumstances, you might need a mutation rate of 20% to 30%.  Those would be cases with very flat local minima/maxima where you can get trapped in that flat spot and need a kick out.

If you want a fitting metaphor for flat spots, I had an old Toyota Corolla with an old starter.  I’d be able to start the car up successfully several times in a row, but then I’d hit a dead spot in the flywheel and it just wouldn’t start.  Eventually, my dad bought me the Official Starter Repair Kit:  a crowbar.  His advice was to apply sufficient percussive force until I nudged the starter out of its dead spot, and then the car could start successfully.  Mutation provides benefits in much the same way.  And just like my beater bar, mutation is not a technique you want to rely upon constantly.  At the extreme, mutation is just random search, losing all of the important information that a genetic algorithm learns during the process.

Finishing The Process

At this point, we have a finished organism.

NewOrganism

All grown up and ready to take on the world

We do this for each slot in the population and then repeat the process:  score, choose, cross over (or not), mutate.  We have a few potential end conditions.  Some of them are:

  1. Stop after a fixed number of generations
  2. Stop after we reach a known ideal solution (e.g., 0 distance from the actual values)
  3. Stop after we get close enough to a known ideal solution
  4. Stop after we have stasis for a certain number of generations, where we have not improved the fitness score for the best agent in a while
  5. Stop after a certain amount of time

There are other stop conditions as well, but these are the most common.

Conclusion

Today we covered the basics of genetic algorithms and how the process works.  Tomorrow, we’ll look at using a genetic algorithm library to solve different types of problems.

The Basics Of Genetics

This is part of a series entitled Genetics In Action.

In today’s post, I want to cover some extremely basic genetic concepts.  We use these concepts in genetic algorithms and genetic programming, and understanding the biological explanations will help us apply the metaphor to software development.

Evolutionary Biology In 400 Words

We have a population of self-contained entities called organisms.  Each organism is entirely independent from other organisms, in the sense that one organism in our population does not depend upon another organism for survival.  In evolutionary algorithms, organisms are our candidate solutions to problems:  we have a number of solutions in our population, and each solution is independent of the other solutions (although they will interact in complex ways, as we’ll talk about later).

Each organism has at least one chromosome.  The purpose of a chromosome is to carry genes.  In evolutionary algorithms, we often simplify the problem by saying that an organism has one chromosome, and it can become easy to conflate organisms and chromosomes for that reason.

Digging deeper, each gene has a number of alternate forms, called alleles.  Combining genes and alleles gives us DNA sequences called genotypes.  A genotype is a possible genetic structure.  Suppose that we have 32 genes, each of which can have 2 alleles.  This would give us 2^32 possible combinations, or a total of 4,294,967,296 possible genotypes.

Genotypes determine phenotypes.  Phenotypes are observable physical characteristics.  Classic examples of phenotypes include eye color, hair color, height, size, and beak structure.  But it’s important to understand that there is no 1:1 correspondence between a genotype and a phenotype; some number of genotypes can end up generating the same phenotype.  Phenotypes depend upon a certain combination of alleles but not the entire genotype.

Also, this goes the other way as well:  a phenotype may only appear when a particular combination of alleles hold a particular value.  Close may count in horseshoes and hand grenades, but it doesn’t count with genetics:  if there are seven alleles which combine to cause a phenotype and only six are set, you will not see the physical characteristics.  Think of this like a password:  unlike in TV and movies, if you have 18 of the 19 characters in a password correct, you shouldn’t get any indicator that you’re ever-so-close.  If we’re using an evolutionary process to find that password (which probably isn’t a good use for an evolutionary algorithm), we won’t get positive feedback until we get it absolutely correct.

The last aspect of genetics that I want to cover today is an environmental niche.  A niche is a set of features which certain phenotypes can exploit.  Think about life on the Arctic:  features like thick fur and layers of fat can help animals survive in these frigid climes.

Applying These To Evolutionary Algorithms

To apply biological concepts to evolutionary algorithms, I’m going to flop around a little bit and start at the end.

First, think about an environmental niche.  The programming concept we’re describing here is a fitness function.  An example of a fitness function may be a set of (x, y) coordinates, and our goal is to find something which maps as closely as possible to those coordinates.

HillClimbingProblem

An example of a fitness function

To solve the fitness function, we want to build up a population of organisms.  Our organisms are actually candidate solutions.  In the example above, these would be mathematical functions which fit the fitness function to a greater or lesser extent.

For the sake of simplicity, each organism will have one chromosome.  In its simplest form, a chromosome is an array which contains a set of elements.  Those elements are akin to genes in our biological nomenclature.  Each element has a set of alleles, that is, possible values.  Therefore, the genotype is the particular combination of elements in the chromosomal array.

A trivial example of a chromosome is a binary array:  [ 0, 0, 1, 0, 1, 1, 0, 1 ] would be a sample chromosome.  Here we have 8 genes, each with 2 alleles, for a total of 256 genotypes.  We apply this to the fitness function and get back a score, and that score tells us how fit the organism is.

In the next post, we’ll go deeper into this metaphor, diving into genetic algorithms.  I’ll explain more about fitness functions and explain what makes these evolutionary algorithms rather than biological algorithms.

Evolutionary Algorithms: The Basics

This is part of a series entitled Genetics In Action.

I’m in the process of creating a new talk, entitled Genetics In Action:  Evolutionary Algorithms.  I have no idea where, outside of work, I’m going to give this talk; still, it’s a topic I love and maybe I’ll sneak it in at some conference somewhere.

Here’s my abstract:

Evolutionary algorithms help us solve optimization problems where we know what the answer is, but don’t know how to get that answer.  In this talk, we will look at how different evolutionary algorithms apply the concepts of genetics to discover formulas and patterns.  Specifically, we will look at genetic algorithms and genetic programming, digging into how they work and solving a number of problems with each.  We will also include a crash course on basic genetics, just in case high school biology isn’t fresh in your mind.

With that in mind, the next several posts will relate to evolutionary algorithms, how they work, and a few of the examples I’m going to use in my demos.

What Makes For A Good Solution?

Revamping a blog post from a decade ago, good solutions to problems tend to have seven positive qualities.  They are:

  1. Correctness – Abstracting away from imprecisions in measurement, the goal is to find the a priori correct solution.
  2. Consistency – Do the same problem, get the same answer.
  3. Justifiability – Cause implies effect; results come from premises through the application of logical rules.
  4. Certainty – Chance should ideally not be involved in the solution.
  5. Orderliness – The process is a logical, step-by-step process in which each step leads to the next.
  6. Parsimony – Given several possible solutions which have the same explanatory power, the simplest is usually the best.
  7. Decisiveness – Once a solution is found, the problem is solved.  In particular, there is a well-defined solution concept.

These are principles that John Holland talks about in his groundbreaking work, Adaptation in Natural and Artificial Systems.  They are seven properties that I think we can all agree are good things to have, and we’d expect them to be important in coming up with solutions to problems.

Evolutionary algorithms fail every one of these criteria.  Despite that, they are very useful tools for solving problems.

What Evolutionary Algorithms Are Out There?

There are four major classes of evolutionary algorithms.  They are:

  1. Genetic algorithms
  2. Genetic programming
  3. Evolutionary programming
  4. Gene expression programming

For this series, I’m going to focus on the two that I’m most familiar with:  genetic algorithms and genetic programming.

What Is An Evolutionary Algorithm?

Evolutionary algorithms are programming techniques which take advantage of biological metaphors to solve a particular class of problem.  At a high level, the algorithmic mechanisms aren’t that difficult to understand, although in practice, you can go significantly deeper than what I’m going to cover in this series.

All of my code for this series will be in R.  There are evolutionary algorithm libraries in a number of languages, including .NET and Java.

What Kinds Of Problems Does An Evolutionary Algorithm Solve?

There is a particular class of problems that evolutionary algorithms are great at solving.  These are problems with a number of characteristics:

  1. They include very large search spaces.  Let’s suppose you can set the values of variables x1…x20, and each variable can be an integer between 0 and 9.  One way of finding the best solution would be to loop through each potential solution, starting from (0, 0, 0, …, 0, 0, 0) up through (9, 9, 9, …, 9, 9, 9).  There’s just one problem:  doing this would take 10^20 trials.  10^20 is 100,000,000,000,000,000,000.  That’s a pretty big number.
  2. There is a known way to score answers.  Evolutionary algorithms are supervised learning algorithms; in order to evolve a solution, we need an end goal.
  3. The solution is too complex to do by hand.  If you can write it out yourself (or calculate it easily), you don’t need a complex algorithm!
  4. We expect that we have the tools necessary to come up with a solution.  As we’ll see, evolutionary algorithms only work if we provide the right building blocks.
  5. The environment is regularly-changing, meaning the appropriate solution changes regularly.  This is not a requirement, but in this scenario, evolutionary algorithms perform much better than many of their static counterparts.
  6. The type of problem is a “hill-climbing” problem, where fitness is approximately continuous.  They can handle “wavy” fitness functions, so we definitely can deal with multiple peaked fitness functions.  But if the fitness function is overly discrete (mostly made up of non-continuous jumps in the fitness function), the likelihood of success goes down.

Okay, So What Kinds Of Problems Does An Evolutionary Algorithm Solve?

Hey, wait, I already answered that!  Oh, fine, you want a list of specific examples…  Very well.  The short answer is that evolutionary algorithms are most useful for solving NP-Hard problems, where we don’t have a one-size-fits-all solution.

  • Optimization Problems – Finding the minimum or maximum value of a mathematical function, circuit layout, antenna design, the traveling salesman problem, the knapsack problem.
  • Machine Learning Problems – Finding optimal weights for neural nets, generating rules for classification engines, training robotic sensors.
  • Economic Models – Portfolio bidding, game theory.
  • Ecological Models – Host-parasite co-evolution, symbiosis, resource flow.

These are particular examples of cases in which we see the characteristics I described above.

Why Not Use Evolutionary Algorithms For Everything?

Evolutionary algorithms aren’t suitable for all types of problems.  If you don’t have a way of programmatically judging correctness (e.g., if you’re performing an unsupervised learning task such as clustering), then evolutionary algorithms simply won’t work.  Also, evolutionary algorithms share a set of negative tendencies:

  • There is no guarantee that an evolutionary algorithm will find a solution, even if there really is one.
  • There is no guarantee of good performance or that the algorithm will finish in a reasonable time.
  • The answers may differ each time you run the algorithm.
  • It is often hard to tell when the algorithm should stop:  I’ve seen cases where there’s stasis for 40-50 generations and then a sudden breakthrough.
  • You can get stuck in local minima/maxima with evolutionary algorithms (though EAs are usually better about breaking free of these than other techniques like simulated annealing).
  • There is no guarantee that the solution your algorithm will provide will be in the easiest-to-understand form.  There may be ways to simplify the formula or decision tree further, as evolutionary algorithms do not prevent things like contradictory branches or vestigial code.  The “best” solution is equivalent to the solution which comes the closest to our desired solution, and usually the size of the evolved result is irrelevant in that regard.

Despite all of these issues, evolutionary algorithms are very powerful techniques.  Over the course of the next few blog posts, I’m going to go into more detail.  The next post will provide a crash course on genetics.  After that, we’ll focus on genetic algorithms and then genetic programming.