The Cost Of Synchronous Mirroring

About a month or so ago, I started dealing with a customer’s performance issues.  When I checked the wait stats using Glenn Berry’s fantastic set of DMV queries, I noticed that 99% of wait stats were around mirroring.  This says that 99% of the time that SQL Server spends waiting to run queries is due to the fact that the primary instance is waiting for the secondary instance to synchronize changes.

The reason that mirroring stats were that high is because my customer is using Standard Edition of SQL Server.  Unfortunately, Standard Edition only allows for synchronous mirroring.  Now, I know that mirroring is deprecated, but my customer didn’t, and until SQL Server 2016 comes out and we get asynchronous (or synchronous) availability groups, they didn’t have many high availability options.

Because this customer was having performance problems, we ended up breaking the mirror.  We did this after I discussed their Recovery Time Objectives and Recovery Point Objectives—that is, how long they can afford to be down and how much data we can afford to lose—and it turned out that synchronous mirroring just wasn’t necessary given the company model and RTO/RPO requirements.  Instead, I bumped up backup frequency and have a medium-term plan to introduce log shipping to reduce recovery time in the event of failure.

But let’s say that this option wasn’t available to me.  Here are other things you can do to improve mirroring performance:

  1. Switch to asynchronous mode.  If you’re using Enterprise Edition, you can switch mirroring to asynchronous mode, which improves performance considerably.  Of course, this comes at the risk of data loss in the event of failure—a transaction can commit on the primary node before it commits on the secondary, so in the event of primary failure immediately after a commit, it’s possible that the secondary doesn’t have that transaction.  If you need your secondary to be synchronous, this isn’t an option.
  2. Improve storage and network subsystems.  In my customer’s case, they’re using a decent NAS.  They’re a small company and don’t need SANs with racks full of SSDs or on-board flash storage, and there’s no way they could afford that.  But if they needed synchronous mirroring, getting those writes to the secondary more quickly would help performance.
  3. Review mirroring.  In an interesting blog post on mirroring, Graham Kent looks at the kind of information he wants when troubleshooting problems with database mirroring, and also points us to Microsoft guidance on the topic.  It’s possible that my customer could have tweaked mirroring somehow to keep it going.

In the end, after shutting off mirroring, we saw a significant performance improvement.  It wasn’t enough and I still needed to modify some code, but this at least helped them through the immediate crisis.  They lost the benefit of having mirrored instances—knowing that if one instance goes down, another can come up very quickly to take over—but because the RTO/RPO requirements were fairly loose, we decided that we could sacrifice this level of security in order to obtain sufficient performance.

Visualizing Map Data In R: Heat Maps

Not too long ago, I looked at using dplyr and tidyr to clean up data, and ended the post with an ugly-looking plot on top of a map of the Raleigh area.  Today, we’re going to look at using a heat map to understand crime data in Raleigh a little easier.

Let’s Heat It Up

I found a good example of heatmaps in R. I’d now like to apply this to our Raleigh example.  Let’s start out by building back up our data set:


raleighcrime = read.csv('/opt/data/Police.csv')
raleighcrime$INCIDENT.DATE = as.Date(raleighcrime$INC.DATETIME, format="%m/%d/%Y %I:%M:%S %p")
raleighcrime$BEAT = as.factor(raleighcrime$BEAT)
raleighcrime$LOCATION = as.character(raleighcrime$LOCATION)

rcf <- filter(raleighcrime, LOCATION != "")
rcf <- separate(rcf, LOCATION, c("LATITUDE", "LONGITUDE"), ",", 1)
rcf$LATITUDE <- round(extract_numeric(rcf$LATITUDE), 3)
rcf$LONGITUDE <- round(extract_numeric(rcf$LONGITUDE), 3)
raleighmap <- get_map(location = c(lon = mean(rcf$LONGITUDE), lat = mean(rcf$LATITUDE)), zoom = 11, maptype = "roadmap", scale = 2)

At this point, we’re going to use the rcf data frame to overlay raleighmap.  I picked up the zoom level on raleighmap through trial and error, so be willing to try out different levels here.

Now that I have a map, let’s make it look nice:

ggmap(raleighmap, extent = "device") +
  geom_density2d(data = rcf, aes(x=LONGITUDE,y=LATITUDE),size = 0.3) +
  stat_density2d(data=rcf, aes(x=LONGITUDE, y=LATITUDE, fill=..level.., alpha=..level..), size=0.01, bins=16, geom="polygon") +
  scale_fill_gradient(low="green",high="red") +
  scale_alpha(range = c(0,0.3), guide=FALSE)

We’ve got some new code here, so let’s dig into it.  Before I describe the function calls, let’s look at the map:


This looks a lot better than the previous map, and I think it gives a pretty decent view of the crime data set, having previously dealt with it in a different format.  We already know ggmap from the last post; it displays a Google Maps map.


The next command is geom_density2d.  It builds those blue contour lines based on density.  I think that contour lines work out well here because this crime data set does follow a contour pattern:  crime does tend to radiate out, with high-crime areas being near other high-crime areas and dissipating over time.  Anyhow, on geom_density2d, there aren’t many interesting parameters.  We assign a data set, create an aesthetic binding latitude and longitude, and set the size ratio to a thin enough line.


By itself, geom_density2d draws some blue lines, which is cool.  But stat_density2d lets us build density maps, like so:


Contours were nice, but this really helps us see high-crime areas more clearly.  Our stat_density2d has two new terms in the aesthetic:  fill and alpha.  Both of these are marked as “..level..”  Here’s the quick explanation.

We have three parameters that we’ve set:  size, bins, and geom.  To show bins in action, I bumped the number up to 60 and rebuilt the plot.  What we end up with is a much busier-looking map:


What we can learn from this is that more bins isn’t necessarily a good thing, as this just makes our crime map look noisier.  We still see the highest-crime points but somebody trying to pick out the most important details of this crime map has to look harder.

As for geom, the help isn’t really that helpful:  “The geometric object to use display the data.”  I’ve confirmed that you can use polygon, tile, and density2d, but don’t know a full list.


In our above map, the color range is blacks and blues.  It looks pretty nice, but there’s a color scheme which makes a bit more sense:  red is worse.  That’s what scale_fill_gradient does here:  we move from green (at the low end) to red (at the high end).


This makes more intuitive sense:  the redder the area, the higher the crime rate.


The big difference between the picture immediately above and the final version is a runthrough with scale_alpha.  This lets us tone down the image a bit and let us bring the contour lines back into focus.

Here are three versions with different alpha ratios:

0-0.3 (my final map):






The difference here is a bit more subtle than some of the other transformations, but playing around with a few different alpha levels helps you get a feel for the effect.  I don’t like how much alpha level 0.5 bleeds.  Alpha level 0.3 is a very understated result, and I think it looks best of the three, although ideally I’d like just a little more color in that plot.


Building graphs and maps is all about making decisions to explain information in a concise manner.  R has some fantastic methods for doing this, one of which is using heat maps.  When overlaid on top of a real-life map, it brings our data to life.

Additional Resources

Watson Personality Insight & Tone Analyzer

Here’s a fun pair of tools out of IBM’s Watson project:  the Personality Insights and Tone Analyzer.

Personality Insights

For Personality Insights, I decided to put in two separate blog posts.  The first blog post is my listing of three essential concepts in economics, written back in 2007.  Because that’s only 3101 words and I needed 3500, I also added in a second blog post on the silliness of the “marketplace of ideas” concept, which bumped me up to 4479 words.  Watson tells me, based on these two:

You are shrewd and skeptical.

You are philosophical: you are open to and intrigued by new ideas and love to explore them. You are empathetic: you feel what others feel and are compassionate towards them. And you are imaginative: you have a wild imagination.

You are motivated to seek out experiences that provide a strong feeling of efficiency.

You are relatively unconcerned with tradition: you care more about making your own path than following what others have done. You consider independence to guide a large part of what you do: you like to set your own goals to decide how to best achieve them.

I take shrewd and skeptical as compliments, but disagree on empathy and lack of concern with tradition…although 2007 me probably was more libertarian and less conservative than 2015 me.

Speaking of 2015 me, I decided to put in a medley of recent, technical posts, including Presentation Redundancy, Warehousing on the Cheap, and How to Troubleshoot Performance Problems.  Watson says:

You are skeptical, inner-directed and excitable.

You are independent: you have a strong desire to have time to yourself. You are self-conscious: you are sensitive about what others might be thinking about you. And you are unconcerned with art: you are less concerned with artistic or creative activities than most people who participated in our surveys.

Experiences that give a sense of prestige hold some appeal to you.

You are relatively unconcerned with tradition: you care more about making your own path than following what others have done. You consider achieving success to guide a large part of what you do: you seek out opportunities to improve yourself and demonstrate that you are a capable person.

Again Watson says I don’t care about tradition.  Aside from that, I can see agreeing more with this version of the result than the prior one.

By contrast, here’s my blogging partner in crime talking about Madden defense, Madden offense, and Madden, Madden, Madden, Madden, Madden (you’d think he was hooked or something).  This gives Watson 4410 words with which to work, and it comes up with:

You are skeptical, somewhat indirect and unconventional.

You are unconcerned with art: you are less concerned with artistic or creative activities than most people who participated in our surveys. You are intermittent: you have a hard time sticking with difficult tasks for a long period of time. And you are proud: you hold yourself in high regard, satisfied with who you are.

Your choices are driven by a desire for organization.

You consider achieving success to guide a large part of what you do: you seek out opportunities to improve yourself and demonstrate that you are a capable person. You don’t find tradition to be particularly motivating for you: you care more about making your own path than following what others have done.

I’d call this a relatively fair reading.  Of course, Watson personality tests share the same basic flaw as horoscopes:  people go in wanting to see certain things, and they’re willing to ignore the parts that don’t make sense to absorb the flattery.  You, as a smart, beautiful, and talented reader clearly know what I mean.

Tone Analyzer

After seeing what a machine learning tool says about me, I decided to put the marketplace of ideas blog post through the Tone Analyzer.  Compared to a business email—which is the default template for this demo—my blog post is much more analytical and confident.  What’s interesting is that the tone analyzer told me that my writing was simultaneously confident and tentative; as well as angry, cheerful, and negative.  I suppose “angry, cheerful, and negative” does tend to describe my writing style fairly well…


On the word count side, the analyzer picked up significantly more “social tone” than anything else:


I admit that I don’t fully understand the significance of this, but I do think it’s cool that there’s a machine learning algorithm which parses text and analyzes word choices for tone and writer traits.  Check it out with some of your own writing if you’d like.

Spinach And Databases

The pseudonymous Phil Factor explains the necessity of constraining data, preventing as much as possible the entry of bad data:

A few thoughts on the video:

  1. When reading Phil’s work, I got an impression of him.  Phil on video is pretty similar to that impression, though I can’t say his appearance is exactly as I expected.
  2. This video sounds like a Phil Factor essay read aloud.  I enjoyed it a lot.
  3. Phil mentions that the iron content in spinach was off due to a misplaced decimal point.  It looks like that was proven incorrect.  Nevertheless, I approve of his disapproval of that ghastly vegetable.

Scheduling Strategies

Recently, I’ve decided to change the way I schedule blog posts.  For a very long time, I would schedule blog posts for 5 PM Eastern, with the idea being that my employer would know I wasn’t blogging on the job.  I also would sprinkle in posts whenever I could on whatever topics I wanted.

After going to the FreeCon, it hit me that I was doing this wrong all along.  This has led me to my new scheduling strategy.  My strategy consists of a few basic components:

  • Blog posts are scheduled for 9 or 10 AM Eastern.  This is because I want there to be new material when people are at work, wondering what to do.  If I have new material available when they’re bored, they’re more likely to come back.
  • My most technical posts will be scheduled to run Tuesday-Thursday.  This is for a similar reason:  people need to fight fires on Monday and probably are taking it easy on Friday, so peak boredom tends to happen Tuesday, Wednesday, and Thursday.
  • Posts are more long-form, broken up with headers.  You might have noticed that with some of the more technical posts lately, but I’m trying to include more images, more headers, and more resources.  My recent post on R and the housing market is a model for where I want to go, not just in content but also in style.  These posts take longer to write, and it’s okay for me to have fewer of them.
  • Non-technical “fun” posts will tend to be more on weekends.  I do more than talk about SQL Server, R, and Hadoop…unlikely as that may seem…
  • I want to have my stories queued up a week in advance.  Blogging every day means that I need more story ideas, more topics.  I might hit a wall and run out of topic ideas, but for right now, I’m good into December.

Next up, after all of this, I might start advertising the blog a bit more.  Tony auto-tweets when he has a blog post, and I may start doing the same.  I resisted this measure for a long time but one tweet about a new blog post isn’t going to overwhelm anybody’s timeline.


Not too long ago, I was using ScriptSafe selectively to block Javascript on webpages.  Back in about June, that started breaking Google searches, and I had to abandon it—which makes sense because it looks like ScriptSafe itself has been abandoned.  Since then, I’ve come upon my new Javascript blocker of choice:  uMatrix.  uMatrix is definitely more advanced and fine-grained than ScriptSafe, but I think it succeeds on its premise a bit better.

One of the things uMatrix does out of the box is allow first-party Javascript to run.  This means that fewer sites will be broken-by-default.  You also get the chance to enable or disable Javascript by domain.  This means that I can enable scripts when I’m using GMail, but I don’t need to open myself up when I’m on an external site.  The matrix concept took a little bit of time getting used to, but I don’t think I want to go back.  uMatrix can also block images, CSS files, plugins, scripts, and iframes from certain domains, meaning that you can let images through a third-party domain but block scripts.  That isn’t really as useful as it seems, but it’s a nice concept.

Also nice is that there’s support in Firefox and Chrome, meaning that I can get the same experience across both browsers.  No IE/Edge support, though.

Curated SQL Is Live

Curated SQL is now live.  I’ve been putting links in for the past week, and I’m slowly starting to advertise the site.  My goal is to build it out further over the next several weeks, improving the Twitter account, trying to draw in a regular viewership, and showing the value of curating technical posts across the world of SQL Server blogs.

As part of this, I’m actively expanding my blogroll.  So far, I have somewhere along the lines of 80-90 SQL Server-specific blogs, and I’ll keep expanding the list.  I may also put out some feelers for a collaborator or two, but I definitely want to give the site a distinctive feel before opening it up in that direction.