The inevitable first post in any technical series is the installation post. We’re going to set up a new instance of SQL Server 2016 and install Polybase. Installing Polybase is straightforward if you follow the installation guide on MSDN.
Grab That Install Disc!
First up, we want to install a new SQL Server instance.
The only Polybase-related option is the PolyBase Query Service for External Data checkbox. If you select that, you’ll note that you need Java installed. Ugh.
If you try to go ahead without installing Java, you’ll get an error message. Yeah, we really have to install Java. Ugh.
Java gets updated due to security vulnerabilities approximately once every three days, so I won’t link to any particular version. You only need to get the Java Runtime Environment (JRE), not the Java Development Kit (JDK). Anyhow, once you have that installed, you can safely install SQL Server.
In the Polybase configuration section, you have the option of making this a standalone Polybase instance or enlisting it as part of a scale-out group. In my case, I want to leave this as a standalone Polybase machine. The reason that I want to leave it as a standalone machine is that I do not have this machine on a Windows domain, and you need domain accounts for Polybase scaleout to work correctly. Later in the series, we’ll give multi-node Polybase a shot.
After defining our cluster type, it’s time to set up accounts. If you selected the standalone instance, then the Polybase accounts will be NT AUTHORITY\NETWORK SERVICE by default. If you noted that you want this instance to be part of a scale-out cluster, you’ll need to enter your domain account credentials here.
At this point, it’s just another SQL Server installation. It may take a little while to install everything, but once it’s done, we’re good to go for now.
The next step involves installing Hadoop and configuring Polybase. I covered that in detail back in June, and that article is still up to date. Note that when you go to download the Hortonworks sandbox, you want to grab HDP 2.4 by hitting the “Hortonworks Sandbox Archive” drop-down and selecting HDP 2.4 for VirtualBox or VMware. With HDP 2.5, the Hortonworks folks changed the way they set up their sandbox; they now use Docker to build a Hadoop image. That’s pretty cool and it works well in most cases, but Polybase does not get along at all with Docker. The reason is that the data nodes have Dockerized IP addresses (e.g., 172.15.0.X), and Polybase nodes try to connect directly to the data nodes. Because 172.15.0.X addresses are non-routable, we can’t reach the data node and thus the process times out.