EDIT 2016-07-21 17:30:00 — It turns out that my distance calculation query was a little bit off. I was accidentally filtering out SQL Saturdays which did not have any 600-mile collisions in the next week but did have a 600-mile collision in the current week. With this in place, there are now 38 pair collisions out of 209 combos. I’ve updated the commentary portion to match these corrected numbers, but they don’t fundamentally alter the point. I also found that I mistakenly mis-coded Kansas City’s state in 2015; it has been corrected in the data file. Moral of this story: don’t blog too early in the morning?
A couple of days ago, Argenis Fernandez put up a notice that PASS is changing the rules for SQL Saturdays:
The first change we are going to make is to the amount of funding support we provide organizers. Currently any organizer, whatever the size of their SQLSaturday event, receives $500. Starting January 1, 2017, we will be reducing this to amount to $250 and targeting those cities/locations that really need the financial support to grow, rather than well-established events that are better capable of attracting sponsors and financial support. When assessing those events that require financial support, PASS will be meeting with event organizers and complete a review of past performance.
Which brings me to the second change: implementing a new 600-mile radius and concurrent event timing restriction to reduce competition between individual SQLSaturday events in Canada/USA. The event timing restriction means that events within the 600-mile radius cannot occur on the same weekend or one weekend prior or after. This will help to deliver a more consistent and satisfying experience for sponsors, which will reduce sponsor exhaustion and assist with increasing overall ROI. The existing 400-mile radius restriction for all other regions will remain.
I don’t mind the first rule change at all, and from what I’ve seen, I don’t think anybody does. That $250 might be the difference between breaking even and going slightly in the hole, but if you know about it beforehand, it’s not that hard to squeeze $250 out of a budget or find $250 in sponsorships somewhere. As a result, it saves PASS somewhere between $25-50K a year and most events won’t notice a difference.
The second change, however, has been…controversial. Steve Jones has a well-written argument against the rule change, particularly in the way this was announced.
How Much Does This Change?
I wanted to check out current SQL Saturdays for 2015 and 2016 and see how many would have been affected by the new rule. So let’s take a look!
Getting The Data
I decided to go with a simplistic approach. Rather than looking up the specific addresses of each SQL Saturday, I went by city name. The way I figure it, if I’m 10-20 miles off, it won’t make a huge difference. I also decided to limit my focus to the US and Canada, as those are the areas affected by the rule change.
I coded each event with city, state (or province), and event date. You can get my results here in CSV format.
Geocoding The Data
My CSV doesn’t include latitude and longitude, so I’m going to load the data into R, get that latitude and longitude information, and then put it into SQL Server. First create the SQL Server table:
CREATE TABLE dbo.SQLSaturdays ( City VARCHAR(40), State CHAR(2), EventDate DATE, Latitude DECIMAL(8, 4), Longitude DECIMAL(8, 4), GeoPoint AS GEOGRAPHY::Point(Latitude, Longitude, 4326) ); GO
And here’s the R script I threw together:
install.packages("ggmap") install.packages("RODBC") library(ggmap) library(RODBC) sqlsat <- read.csv("C:\\Temp\\2015_2016_SQLSaturdays.csv") sqlsat$City <- as.character(sqlsat$City) sqlsat$State <- as.character(sqlsat$State) sqlsat$Date <- as.Date(sqlsat$Date, format="%m/%d/%Y") sqlsat$GeoLookup <- paste(sqlsat$City, sqlsat$State, sep = " ") # Perform Google Maps geocoding. Google Maps provides 1000 geocoding hits per day for free. # Our data set is only 106 observations, so we can do that without issue. sqlsat <- cbind(sqlsat, t(sapply(sqlsat$GeoLookup, geocode, USE.NAMES=FALSE))) sqlsat conn <- odbcDriverConnect("Driver=SQL Server;Server=.;Initial Catalog=Scratch;Provider=SQLNCLI11.1;Integrated Security=SSPI") # Via http://stackoverflow.com/questions/14334840/how-to-insert-a-dataframe-into-a-sql-server-table # Not a great practice, but for 106 rows, it'll do. values <- paste("('",sqlsat$City,"','",sqlsat$State,"','",sqlsat$Date,"',",sqlsat$lat,",",sqlsat$lon,")", sep="", collapse=",") cmd <- paste("INSERT INTO Scratch.dbo.SQLSaturdays(City, State, EventDate, Latitude, Longitude) VALUES ", values) result <- sqlQuery(conn, cmd, as.is=TRUE) close(conn)
There are a couple not-so-great practices (particularly around the way I inserted data into SQL Server) but it does the job, especially when you only have 106 rows. Also, if you want to play along at home, you’ll probably want to change the connection and database names.
From there, I ran a T-SQL query to do the following:
- Turn latitude and longitude into GEOGRAPHY points
- Find SQL Saturday pairs which occur within one week of one another
- Calculate the distance in miles between these two city pairs
- Return only city pairs which are less than 600 miles apart
Here’s the script:
WITH sqlsats AS ( SELECT City, State, EventDate, Latitude, Longitude, GeoPoint, DENSE_RANK() OVER (ORDER BY EventDate) dr FROM dbo.SQLSaturdays s ), chronoprox AS ( SELECT s.City, s.State, s.EventDate, s.Latitude, s.Longitude, s.GeoPoint, sNext.City AS NextCity, sNext.State AS NextState, sNext.EventDate AS NextEventDate, sNext.GeoPoint AS NextGeoPoint FROM sqlsats s LEFT OUTER JOIN sqlsats sNext ON (s.dr = sNext.dr - 1 OR s.dr = sNext.dr) AND s.EventDate >= DATEADD(DAY, -7, sNext.EventDate) AND NOT (s.City = sNext.City AND s.State = sNext.State) ), geoprox AS ( SELECT cp.City, cp.State, cp.EventDate, cp.Latitude, cp.Longitude, cp.GeoPoint, cp.NextCity, cp.NextState, cp.NextEventDate, cp.NextGeoPoint, cp.GeoPoint.STDistance(cp.NextGeoPoint) / 1609.344 AS DistanceInMiles FROM chronoprox cp WHERE cp.NextGeoPoint IS NOT NULL ) SELECT gp.City, gp.State, gp.EventDate, gp.NextCity, gp.NextState, gp.NextEventDate, gp.DistanceInMiles FROM geoprox gp WHERE gp.DistanceInMiles < 600 ORDER BY gp.NextEventDate;
The end result is that there were 38 city pairs with a distance of less than 600 miles in 2015 and 2016. 23 of these pairings took place in 2015, and 15 in 2016 (including scheduled events which have not yet happened).
10 of the 38 city-pairs were in the run-up to PASS Summit 2015, and Raleigh & Charlotte were right in the middle of that.
As Grand Poobah of the Raleigh SQL Saturday, this does affect me, as we have some larger events (Atlanta, Orlando, Washington, Charlotte) in our region, as well as smaller institutions (Richmond, Spartanburg). Finding a good date has become a bit harder (in my case, because I don’t want to jump claim on any of those events’ dates), but I also don’t want to oversell that difficulty: out of 209 potential collisions, we only saw 38. And that’s factoring in current concerns people have about the total number of available dates, such as how you would never have a SQL Saturday around Thanksgiving or Christmas or Labor Day or…
I also don’t want to undersell the marginal increase in difficulty for smaller SQL Saturdays to keep going—it’s hard to find a venue even in the best of circumstances (trust me, I know!), and feeling like you’re going to be locked out of a good portion of the year reduces your options even further. My original concern was mostly around midwestern events, as they’re going to have the most geographical overlap, but if you check out the pairs, the southeast is actually the hardest-hit region:
--I took the results from the previous query and put them into a temp table called #tmp. WITH records AS ( SELECT City, State FROM #tmp UNION ALL SELECT nextcity, nextstate FROM #tmp ) SELECT City, State, COUNT(1) AS Collisions FROM records GROUP BY City, State ORDER BY Collisions DESC;
Kansas City tops the list with 6 collisions, followed by Raleigh with 5 (go us!). Also near the top of the list are Spartanburg, Dallas, Columbus (GA), and Orlando have 4 apiece. Note that the total number of collisions adds up to 76 because I include both “sides” of the 38 collisions. This does affect 34 separate events, though.
While I have the data, how about we play around with a couple of alternatives?
- If we shrink the collision radius to 500 miles, we’re down to 30 collisions, including 14 in 2016.
- At 400 miles (the international limit), we have 18 collision pairs, including 7 in 2016.
- At 700 miles, we’re up to 54 collisions, including 20 in 2016. As a side note, at 700 miles, we would have had collisions in all 9 of the cities which had SQL Saturdays in the first three weeks of October 2015 (Columbus GA, Pittsburgh, KC, Orlando, Raleigh, Minneapolis, Boston, Charlotte, Dallas). At 600 miles, Boston eeks out of the list.
Final Thoughts (For Now)
I’m still on the fence about this decision. As a speaker, I like the fact that there are so many events on the east coast and midwest, and I’d hate to see the number of opportunities to visit other cities and try to help people learn drop as a result of this. Looking at the data, I think we’ll lose some events on the margin. There is some opportunity for growth in less desirable times of the year (think July), but the problem is that if you’re already running a smaller event, picking a bad time of year will guarantee that not many people will be able to show up.
But at the same time, I’ve heard from several events that I’ve attended that sponsorships are drying up this year. If that’s the case across the board, then we might have reached the event limit, particularly in areas like the southeast which have a large number of events.