More Polybase With Tar

Last week, I looked at flat file compression formats and noted that when you I tarred a file, I lost the top row.  Today, I'm going to have a short test in which I try to tar several files and see what happens. The Setup I have four copies of my Second Basemen data set, each saved…

Flat File Compression In Polybase

For today's post, I want to look at how well Polybase on Hadoop can handle different compression formats.  I'm going to use the same compression codec (org.apache.hadoop.io.compress.DefaultCodec) for each.  Each file will be a small, 777-record file set of second basemen, and I will run two queries, one which is a simple SELECT *, and one which…

External File Formats

In today's post, we'll spend some time looking at the different external file formats available to Polybase.  My goal in this post is to cover each of the file formats and give some idea of when and why we might want to use each format.  Let's start with the MSDN page.  On this page, we…

External Data Sources: Hadoop

Today's post will look in some detail at the first of several external data sources.  First, let's check out MSDN. There are a half-dozen options, but for today, we're focusing on a Hadoop cluster: Using This Knowledge We've got Polybase configured, and we know which Hadoop source we're using (Hortonworks HDP 2.4 in my case),…

Proper Configuration Is Key

A while back, I had a post on configuring Polybase in which I copied yarn-site.xml values from my HDP sandbox into Polybase.  That led to pushdown errors which vexed me for a while.  At PASS Summit, I had a chance to talk to Murshed Zaman, author of a blog post on common Polybase errors.  He walked…

Let’s Install Polybase

The inevitable first post in any technical series is the installation post.  We're going to set up a new instance of SQL Server 2016 and install Polybase.  Installing Polybase is straightforward if you follow the installation guide on MSDN. Grab That Install Disc! First up, we want to install a new SQL Server instance. The…

Why Not Polybase?

My last post covered reasons why I think Polybase is big.  Today's post will cover reasons why I'm potentially wrong and what I plan to do about it. Motivation Here's my motivation in two tweets: @feaselkl Because it’s crazy rare in the wild. #sqlsummit — Brent Ozar (@BrentO) October 27, 2016 The Problems So let's…

Why Polybase?

As I mentioned earlier this week, I'm restarting my series on Polybase, digging deeper than before.  Today's post will be non-technical but will explain my motivation behind it. Keynote Love Polybase was featured in both PASS Summit 2016 keynotes.  You can watch them on PASStv (keynote day 1, keynote day 2). If you don't want to…

SSIS And HDInsight

Not too long ago, I decided to try connecting to HDInsight using SSIS.  One of the Integration Services updates for SQL Server 2016 is Hadoop support.  This Hadoop support comes via the Hadoop Connection Manager, which allows for two connection methods:  WebHCat and WebHDFS.  My preference is WebHDFS, but it appears that we cannot use WebHDFS…

Latency Versus Throughput

While working on my Kafka series, I was looking at ways of improving performance when using the RdKafka .NET provider.  One consistent message throughout this is that the default configuration leads to relatively high latency.  This has been true for a while and there's even a question in the FAQ about this.  This is certainly not something limited…