In the last post, we dug into how the MapReduce process works. Today, I want to spend a little bit of time talking about forcing MapReduce operations and disabling external pushdown. What Can We Force? Before we get into the mechanics behind forcing pushdown, it's important to note that predicate pushdown is somewhat limited in…
Running MapReduce Polybase Queries
Previously, we looked at basic querie which do not perform MapReduce operations. Today's post is going to look at queries which do perform MapReduce operations and we'll see how they differ. MapReduce Queries In order for us to be able to perform a MapReduce operation, we need the external data source to be set up…
Running Basic Polybase Queries On Hadoop
Following the Polybase series, we have created an external table and we can now use it like any other table. For example: In this query, dbo.SecondBasemen is an external table and TopSalaryByAge is a table in SQL Server. I haven't found any T-SQL syntax which does not work, and although I haven't tried everything, I've tried…
Let’s Build A Hadoop Cluster, Part 3
Last time around, we installed Ubuntu and Docker on our Hadoop cluster-to-be. Now we strike and install Hadoop. Caochong My project of choice for installing a Hadoop cluster using Docker is Weiqing Yang's caochong. It's pretty easy to install, so let's get started. I'm going to assume that you have a user account and have…
Let’s Build A Hadoop Cluster, Part 2
In part 1 of this series, we bought some hardware. After patiently(?) waiting for it, we have the hardware and installed Ubuntu, so let's keep going. Docker? I hardly even know her! Hadoop on Docker is a relatively new thing. Thanks to Randy Gelhausen's work, the Hortonworks Data Platform 2.5 sandbox now uses Docker. That has negative…
Let’s Build A Hadoop Cluster, Part 1
I'm taking a short break from my Polybase series to start a series on setting up a Hadoop cluster you can put in a laptop bag. For today's post, I'm going to walk through the hardware. My idea for my on-the-go cluster hardware comes from an Allan Hirt blog post. After seeing how powerful the…
Curated SQL At One Year
Curated SQL is just over a year old now. I started work on it during PASS Summit 2015, and I'm happy with the results so far. By The Numbers I'm up to 2026 posts on Curated SQL. My three largest categories are Administration, Cloud, and Hadoop. I used to call "Cloud" Azure, but kept running…
Loading Into Columnstore: Avoid Trickle Loads
I'm going to tell this one in story format, so here's the short of it up front: tl;dr --- Clustered columnstore indexes don't like the combination of wipe-and-replace with multi-threaded trickle loaders. Avoid that pattern. The Setup In the olden days, we had a large fact table with a standard clustered index and some standard non-clustered indexes.…
R For The DBA: Graphing Rowcounts
Something I am trying to harp upon is that R isn't just a language for data analysts; it makes sense for DBAs to learn the language as well. Here's a really simple example. The Setup I have a client data warehouse which holds daily rollups of revenue and cost for customers. We've had some issues with…
Creating External Tables
At this point, we've got Polybase configured and have created external data sources and file formats (and by the way, if you haven't been keeping up, I am tracking my entire Polybase series there). Now we want to create an external table to do something with all of this. As usual, my first stop is MSDN, which…