Getting Started With Spark

If you're interested in learning more about Apache Spark, there are a couple of options available to you.  About a month ago, I covered installing Spark on a Windows machine.  You can also use the Hortonworks Data Platform and run Spark from it.  Today's option, though, does not require a powerful machine; instead, it uses…

Pluralsight Reviews: Building Blocks of Biml

Today's Pluralsight course review is Stacia Misner Vargas's Building Blocks of Biml. The course is just over 4 hours long, and it is completely worth watching.  I'd given my Data Migration With Biml talk a few times before watching this, but as a result of watching this, I decided to revamp the code entirely.  From this…

Learning More About Polybase DMVs

Today, we're going to continue our Polybase training regimen and hopefully get one step closer to beating the query that stopped us before.  This time around, we're going to look at some of the information SQL Server offers us on external tables to see if there's something that might be helpful. DMVs Everywhere SQL Server…

Where Do Polybase Stats Live?

After the first Polybase statistics post, we left a bit defeated.  I want to find an efficient query which doesn't pull all of the rows back into SQL Server, thereby defeating one of the major benefits of Polybase:  that I can move large amounts of data to Hadoop and use that relatively less expensive cluster…

Pluralsight Review: DBCC Commands

I'm taking a quick break from my Polybase series to pick back up on an older series:  Pluralsight course reviews. This review is for Erin Stellato's SQL Server:  Understanding and Using DBCC Commands.  It was a great introduction to various DBCC commands, although Erin points out a few times that she's skipping the corruption-checking commands:  DBCC…

Polybase Statistics, Round One

Yesterday, we built the SecondBasemen table.  That's a really small table, so we'd expect pretty much any queries to run quickly.  So let's scale it up just a little bit. Looking At (Some) Flights If you grabbed the flight data already, you're in good shape.  Otherwise, at least get the 1987 data file, which is…

Connecting To Hadoop

Yesterday, we loaded some data sets into Hadoop.  Today, I'm going to show how we can link everything together.  By the end of this blog post, we're going to have a functional external table which reads from our Second Basemen data set. Prep Work We're going to need to do a little bit of prep…

Let’s Get Some Data!

Yesterday, I showed how to install and configure Polybase.  Today, we're going to do two things:  get a small data set (for Polybase testing) and get a large data set (for some real testing).  Tomorrow, we're going to use Polybase (finally!) to grab some data. Get Data! We are going to build a data set of…

Configuring Polybase

Update 2016-11-08:  It turns out that my yarn-site.xml configuration settings were not correct for the version of Hadoop I'm using (Hortonworks Data Platform 2.4).  I have a new post which corrects this. So, it's been a while since I promised I would start working on a Polybase series.  Life has gotten the better of me over the past few…

This War of Mine: The Review You Can Use [TM]

This War of Mine is the hardest PC game I've ever played. I do not mean that it's difficult, although it is. I mean that some of the most heart-wrenching moments I've ever experienced in gaming occurred while I was playing it. This War of Mine is a fairly simple premise: you control a group…