PolyBase Now Supports the Microsoft Access Excel Driver

SQL Server 2019 Cumulative Update 2 has come out and it is chock-full of PolyBase-related stuff.

One interesting tidbit I read was the following:

FIX: Include support for Excel ODBC driver and Text ODBC driver from Microsoft Access Database Engine Redistributable package in PolyBase for SQL Server 2019

If you tried to use Microsoft’s Excel driver prior to 2019 CU2, you’d get the following error:

Msg 105082, Level 16, State 1, Line LineNumber
105082;Generic ODBC error: [Microsoft][ODBC Excel Driver]Optional feature not implemented

To this point, I recommended in PolyBase Revealed that you use a different driver, like CData’s, which did work. CData’s driver still works (I assume…PolyBase ODBC support is a fluid situation, it seems), but now I can officially say that PolyBase supports the Microsoft Access Database Engine Redistributable driver for Microsoft Excel. Let’s go to the tape.

Prep Work

First thing’s first, we need to get a driver. There are three versions of this driver, one for 2016, one for 2013, and one for 2010. I installed the 2016 version on a virtual machine with SQL Server 2019 CU2 and PolyBase installed and configured, but no version of Microsoft Office installed. I installed the 64-bit driver. If you did this correctly, you should see the following in your ODBC Data Source Administrator:

Finally, support for dBASE.

After that, we create our external data source. If you want to play along at home, grab the North Carolina Population spreadsheet from my totally not sketchy at all Blob Storage container. It should be 479 KB downloaded.

Once you have the file, let’s open up our database master key and create an external data source:

USE [Scratch]
GO
OPEN MASTER KEY DECRYPTION BY PASSWORD = '<<SomeSecureKey>>';
GO

IF NOT EXISTS
(
    SELECT 1
    FROM sys.external_data_sources ds
    WHERE
        ds.name = N'NorthCarolinaPopulationExcelUntyped'
)
BEGIN
    CREATE EXTERNAL DATA SOURCE NorthCarolinaPopulationExcelUntyped WITH
    (
        LOCATION = 'odbc://noplace',
        CONNECTION_OPTIONS = 'Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)}; DBQ=C:\SourceCode\Polybase\NorthCarolinaPopulation.xlsx'
    );
END
GO

If you have already purchased enough copies of PolyBase Revealed to re-roof your house, you may notice that this code looks familiar. It’s pretty close to what we used for reading an Excel file using the CData driver, though instead of specifying a DSN that I’ve created, I’m specifying the file’s location here. Also of importance is that the driver must be listed as Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb) exactly, with all of those file types specified. If you don’t get that right, you’ll get the following error:

Msg 105070, Level 16, State 1, Line 28
105070; The ODBC data source name was not specified or the driver is missing.

Assuming you copy-paste or type the driver’s name correctly, it’s time to create an external table. The CData driver I used in the book required adding a RowID column and took a good look at the column types rather than the column values. That’s why the Excel table creation scripts look the way they do. With the Microsoft Access Database Engine Redistributable driver for Microsoft Excel, there are a few changes. Here’s a valid create script:

IF NOT EXISTS
(
    SELECT 1
    FROM sys.external_tables t
    WHERE
        t.name = N'NorthCarolinaPopulationExcelUntyped'
)
BEGIN
    CREATE EXTERNAL TABLE dbo.NorthCarolinaPopulationExcelUntyped
    (
        SUMLEV FLOAT(53),
        COUNTY FLOAT(53),
        PLACE FLOAT(53),
        PRIMGEO_FLAG FLOAT(53),
        NAME NVARCHAR(255),
        POPTYPE NVARCHAR(255),
        YEAR FLOAT(53),
        POPULATION FLOAT(53)
    )
    WITH
    (
        LOCATION = '[NorthCarolinaPopulation$]',
        DATA_SOURCE = NorthCarolinaPopulationExcelUntyped
    );
END
GO

How is this different? Let me count the ways.

  1. The script does not include a RowID column. If you do try to include this, the create statement will fail.
  2. The data types are different. This Excel driver looks at the first several rows (I think it’s 8 by default, but I’m not an expert with this driver) and infers data types which fit. Therefore, all of the numeric features were turned into FLOAT(53) even though I have the columns typed as generic text in the Excel file itself.
  3. The LOCATION parameter is a bit different: I need to add a dollar sign after the worksheet name. The brackets around the name are optional, but if you fail to include the dollar sign, you will get the following error:

Msg 2706, Level 16, State 1, Line 30
Table ‘`NorthCarolinaPopulation`’ does not exist.

Once you take care of creating the external table the right way, you get results:

Finally, we can unleash the power of Microsoft Excel.

Conclusion

Microsoft continues to invest in PolyBase. After the release of SQL Server 2019, the SQL Server team had 16 separate fixes and enhancements which reference PolyBase in the title. This is a nice one because it means you can read Excel files without having to pay for a third-party driver.

One thought on “PolyBase Now Supports the Microsoft Access Excel Driver

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s