PolyBase: When You Have No Local Tables

Today's PolyBase post is all about what happens when you want to join data from multiple data sources together, but none of your tables are local SQL Server tables. No Local Tables were Harmed in the Making of this Post Let's suppose we have two sets of data in two different sources. My first is…

PolyBase + Dockerized Hadoop

Quite some time ago, I posted about PolyBase and the Hortonworks Data Platform 2.5 (and later) sandbox. The summary of the problem is that data nodes in HDP 2.5 and later are on a Docker private network. For most cases, this works fine, but PolyBase expects publicly accessible data nodes by default---one of its performance…

PolyBase and External Column Names

The SQL Server 2019 CTP 3.2 release notes includes a couple lines of text which are easy to miss: External table column names are now used for querying SQL Server, Oracle, Teradata, MongoDB, and ODBC data sources. In previous CTP releases, the columns were bound only based on ordinal on the destination and column names…

PolyBase and Azul Zulu OpenJDK

One of the more interesting parts of SQL Server 2019 CTP 3.2's release notes is the relationship between Microsoft and Azul Systems. Travis Wright covers it in some detail, as well as what it means for customers. Prior to SQL Server 2019 CTP 3.2, installing PolyBase required an installation of Oracle's Java Runtime Environment 7…

PolyBase Revealed: PolyBase to Spark

Today's PolyBase Revealed post covers another thing I've been waiting for in PolyBase for a long time: integration with Apache Spark. MapReduce is Slow A big reason I'm interested in integrating PolyBase with Apache Spark is that the current techniques for integrating with Hadoop---either streaming all of the data over from HDFS into SQL Server…

PolyBase Revealed: the DW Databases

Today is a fairly short post covering a trio of databases you might not even know you have: DWConfiguration, DWDiagnostics, and DWQueue. The PolyBase installer drops all three of these on your instance. Let's go in ascending order of the number of useful tables. DWQueue The DWQueue database has two tables, neither of which I've…

PolyBase Revealed: MRAppMaster Errors

Let me tell you about one of my least favorite things I like to see in PolyBase: Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster This error is not limited to PolyBase but is instead an issue when trying to run MapReduce jobs in Hadoop. There are several potential causes, so let's cover each…

PolyBase Revealed: Hive Shim Errors

I just recently worked through an error in which predicate pushdown would work for flat files but would fail with a weird error on ORC files. tl;dr If you're hitting Hive 3, make sure you're using SQL Server 2019 CTP 2.3 (or later). The Equipment HDP 3.0.1.0-187 running standalone. This includes HDFS 3.1.1, Hive 3.1.0,…