Wait, What?

Today’s post comes inspired by an Ed Pearson tweet that Chris Wood clued me into.

Is there someone? Maybe!

It took me a couple of weeks to get to this thanks to my crazy schedule but I have good news for you.

PolyBase 12, Flat Files 8

I really like PolyBase’s ability to read from Azure Blob Storage. But what about reading from files on disk? I knew that we could get Excel files working but how about regular ol’ text files? It turns out that you can using the Microsoft Access Text Driver.

You are going to need to install the Microsoft Access Database Engine 2016 Redistributable on your machine hosting SQL Server. This installs a few drivers, including Access, Excel, and text files.

Deep lore: SQL Server is secretly Microsoft Access.

From there, it’s close to the same process with one minor adjustment.

Let’s create an external data source.

USE [Scratch]
GO
OPEN MASTER KEY DECRYPTION BY PASSWORD = '<<SomeSecureKey>>';
GO

IF NOT EXISTS
(
    SELECT 1
    FROM sys.external_data_sources ds
    WHERE
        ds.name = N'AccessTest'
)
BEGIN
    CREATE EXTERNAL DATA SOURCE AccessTest WITH
    (
        LOCATION = 'odbc://noplace',
        CONNECTION_OPTIONS = 'Driver={Microsoft Access Text Driver (*.txt, *.csv)}; DBQ=C:\Temp\AccessTest'
    );
END
GO

We decrypt our master key first, same as always. Then, when creating the external data source, I set the location to some gibberish string. The connection options are more important here. Note that I need to specify the driver name exactly as you see it in the ODBC drivers list, including the bits in parentheses. Then, in the DBQ variable, I need to set the folder in which the file resides. With Excel, we pointed to the file but the Access Text Driver requires that you use the directory.

Here’s the definition for our external table:

IF NOT EXISTS
(
    SELECT 1
    FROM sys.external_tables t
    WHERE
        t.name = N'AirportCodes'
)
BEGIN
    CREATE EXTERNAL TABLE dbo.AirportCodes
    (
        AIRPORT_ID INT,
        AIRPORT NVARCHAR(255),
        DISPLAY_AIRPORT_NAME NVARCHAR(255),
        LATITUDE FLOAT,
        LONGITUDE FLOAT
    )
    WITH
    (
        LOCATION = '[AirportCodeLocationLookupClean.csv]',
        DATA_SOURCE = AccessTest
    );
END
GO

The external table definition is fairly standard with one exception: the location needs to be the full file name inside that folder with brackets around it. If you don’t use brackets around a file name, whether or not it is valid, you’ll get this error:

Msg 2706, Level 16, State 1, Line 1

Table ‘`AirportCodeLocationLookupClean`.`csv`’ does not exist.

Meanwhile, if you use brackets around a filename which does not exist, you get this fun error:

Msg 105121, Level 16, State 1, Line 1
105121;The specified LOCATION string ‘[Codes.csv]’ could not be parsed. A 1-part identifier was found. Expected 2.

But if you get it right, you get this result, included sort of like a “proof of life” thing:

Not pictured: today’s newspaper. Because seriously, who reads newspapers anymore?

Notes and Gotchas

Forthwith are a couple quick notes:

  • The file must contain headers, as PolyBase will use the first row as headers.
  • PolyBase will decide what it wants your field lengths and data types to be. For example, I may know that the airport is CHAR(3) but it has decided NVARCHAR(255) and who am I to judge?
  • Your file name may have spaces and other normal characters in it. I didn’t try crazy things like emoji, though, as I’m not that creative.
  • You can only read one file in an external table. This is not like how it works with HDFS or Azure Blob Storage.
  • By default, you must have one of four extensions for your text files: .csv, .txt, .asc, and .tab.
Gratuitously pictured: the directory selection menu. Because I, too, thought the ’90s were a great time.

If you do try to create a DSN with a special extension, note that it wants three characters, so “.flutter” wouldn’t work but “.fln” could. That said, I wasn’t able to get a custom DSN working with the Access driver; I kept getting null pointer exceptions (“object reference not set to an instance of an object”) when trying to create an external table.

Conclusion

If you need to read from text files, you can do so with PolyBase in SQL Server 2019 by using the Microsoft Access Text Driver. If there are other drivers which allow you to read in text files as tables via ODBC, you could use those as well but at least I know this one works.

One thought on “PolyBase versus Flat Files

Leave a comment