PolyBase and Postgres

Last month, I presented at year 4 of SQL Trail, a conference normally in Richmond, Virginia but entirely online this year due to obvious reasons. Before the conference, one of the attendees asked for a session on PolyBase, specifically around accessing Postgres data from SQL Server. I didn’t have an example on that, so figured I could come up with one and (eventually) blog about it.

Preparing Postgres

I didn’t have a Postgres server lying around anywhere, so I decided to set up my own using a Docker container:

docker pull postgres
docker run --name pg -e POSTGRES_PASSWORD=pgtestpwd -p 5432:5432 -d postgres

What this does is first, pull down the latest Postgres image from Docker Hub. Then, we create a container named pg based off of the postgres image. We’re going to forward requests on port 5432 to the Postgres container and set the administrative password to pgtestpwd, clearly a secure password.

Next up, we need to retrieve the Postgres ODBC driver. You can find drivers on the Postgres website. I’m using Windows for this demo, so I grabbed the latest MSI from the MSI directory. We need to install the ODBC driver on the machine which is running SQL Server. In this example, I have SQL Server installed on my PC, not in a Docker container or VM, so I’ll install the ODBC driver directly on my PC. In a prod scenario, you’d install it on your production machine running SQL Server, not the production machine running PostgreSQL.

After that, we need to get some data for our database. I’m going to create a simple table called event which stores event data. Because the data itself doesn’t really matter, roll with me here. You can connect to your new Postgres server from Azure Data Studio, pgadmin4, or whatever other tool you’d like.

If you do connect to Postgres via Azure Data Studio, like I did, you want to make sure that you’ve set the connection type correctly:

Connect to a Postgres instance

If you don’t see this connection type, make sure you have the PostgreSQL extension for Azure Data Studio.

Make sure you have the PostgreSQL extension.

Once connected, we can run the following script. I decided to use the default postgres database, but you can certainly create your own database as well if you’d like to make this more realistic.

CREATE TABLE event(
    id serial,
    machine_id int,
    event_id varchar(40),
    event_type varchar(70),
    entity_type varchar(70),
    entity_id varchar(40),
    event_data json
)

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

-- Run this several times to generate data.
INSERT INTO event(
    machine_id,
    event_id,
    event_type,
    entity_type,
    entity_id,
    event_data
)
VALUES
(
    21644,
    uuid_generate_v4(),
    'Telemetry Ingest',
    'MachineTelemetry',
    uuid_generate_v4(),
    '{ "Test": "Yes" }'
);

SELECT * FROM event;

If everything worked as expected, the final SELECT statement should return some results for you.

Connecting to Postgres

Now that we have some data, let’s go back to SQL Server. I assume you’ve already installed and configured PolyBase—if not, check out my presentation on PolyBase. Note that this requires SQL Server 2019 or later, as that’s the first version which supports PolyBase to ODBC. Here’s a script which assumes a database named Scratch and a master key <<SomeSecureKey>>.

First up, we want to create a database scoped credential so that we can authenticate to Postgres. This credential gets encrypted using the database master key.

USE [Scratch]
GO
OPEN MASTER KEY DECRYPTION BY PASSWORD = '<<SomeSecureKey>>';
GO

IF NOT EXISTS
(
	SELECT 1
	FROM sys.database_scoped_credentials
	WHERE
		name = N'PostgresCredential'
)
BEGIN
	CREATE DATABASE SCOPED CREDENTIAL PostgresCredential
	WITH IDENTITY = 'postgres',
	SECRET = 'pgtestpwd';
END
GO

After creating the database scoped credential, we will create a data source pointing to Postgres, connecting on localhost and port 5432, just like we set up with Docker. We’ll specify the PostgreSQL Unicode driver and point to the postgres database. Note that PostgreSQL contains ANSI and Unicode drivers, meaning that all of your strings will be either ANSI or Unicode. Given my preferences in SQL Server, I think it’d be obvious which way I chose.

IF NOT EXISTS
(
    SELECT 1
    FROM sys.external_data_sources ds
    WHERE
        ds.name = N'PostgresEvents'
)
BEGIN
    CREATE EXTERNAL DATA SOURCE PostgresEvents WITH
    (
        LOCATION = 'odbc://localhost:5432',
        CONNECTION_OPTIONS = 'Driver={PostgreSQL Unicode}; Database=postgres',
		CREDENTIAL = PostgresCredential,
		PUSHDOWN = ON
    );
END
GO

Next, we’ll wrap up the external object creation by building out an external table. This external table will use NVARCHAR to represent both Postgres strings and the JSON data type.

IF NOT EXISTS
(
    SELECT 1
    FROM sys.external_tables t
    WHERE
        t.name = N'PostgresEvent'
)
BEGIN
    CREATE EXTERNAL TABLE dbo.PostgresEvent
    (
        id INT,
        machine_id INT,
        event_id NVARCHAR(40),
        event_type NVARCHAR(70),
        entity_type NVARCHAR(70),
        entity_id NVARCHAR(40),
        event_data NVARCHAR(255)
    )
    WITH
    (
        LOCATION = 'event',
        DATA_SOURCE = PostgresEvents
    );
END
GO

The final test comes when we query the data. Run the following and you should see the same set of rows that you saw when connecting directly to Postgres:

SELECT * FROM dbo.PostgresEvent;
GO

Postgres-Specific Notes

I am not particularly familiar with PostgreSQL, certainly not to the point where I could give you the ins and outs of compatibility issues. My simple tests were successful and I didn’t find a huge number of people complaining about compatibility issues with PolyBase, but Postgres has multiple languages and enough complex data types that there may very well be compatibility problems. But that said, one of the problems I expected—having a JSON data type—converted just fine. I did see an issue around the NUMERIC data type but didn’t dive in.

Shameless Advertising

In case you want to learn a lot more about PolyBase, check out PolyBase Revealed, the only book-length treatment on the topic. Not only that, but it also makes for a great monitor stand—it’s a thick enough book to make a difference, but thin enough that you can get more precise monitor adjustments with it. Buy several copies and relieve neck strain today!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s