Now that SQL Server 2022 has hit RC0, I’ve decided to give it a try and see what has changed with respect to PolyBase. In today’s post, we’ll look at connecting to Cosmos DB using the MongoDB PolyBase connector.

The Problem

Back in the 2019 days, I noted a problem when CU2 of SQL Server 2019 came out. This is because the Cosmos DB collection I was using reported a wire version of 2 rather than the minimum version of 3. The official fix at that time was to create a new collection using the then-latest version of 3.6 but that didn’t work for me. My workaround was to use the old MongoDB drivers that shipped with SQL Server 2019 RTM.

Well, as of 2022, that solution won’t work anymore. The original MongoDB drivers don’t ship with SQL Server 2022, so we can’t use that workaround. I had a Cosmos DB account that was originally built on version 3.6. Even after upgrading to server version 4.2, it still reported wire version 2 when I connected to the endpoint that was relevant 3 years ago. Therein lies the solution to the problem.

The Solution

The first solution I came up with was to create a new Cosmos DB with MongoDB API using server version 4.0 or later (specifically, I used 4.0). This did work, though I had to re-create any collections and reload all of the data. Since then, I realized that there may be a better solution. Let’s walk through setting up a Cosmos DB external table to see that solution.

First up, we need a database scoped credential. Those are pretty easy to create and require that you get the Username and Primary (or Secondary) Password from the Connection String menu in Cosmos DB under the Settings header.

CREATE DATABASE SCOPED CREDENTIAL CosmosCredential
WITH IDENTITY = '<Your User>', Secret = '<Your PWD>';

From there, we will need to create an external data source. Here’s the NO LONGER VALID external data source definition I used with SQL Server 2019:

CREATE EXTERNAL DATA SOURCE CosmosDB WITH
(
	LOCATION = 'mongodb://cspolybase2.documents.azure.com:10255',
	CONNECTION_OPTIONS = 'ssl=true',
	CREDENTIAL = CosmosCredential,
	PUSHDOWN = ON
);

And here’s the NOW VALID external data source definition I switched to after upgrading my Cosmos DB version to 4.0 or higher:

CREATE EXTERNAL DATA SOURCE CosmosDB WITH
(
    LOCATION = 'mongodb://cspolybase2.mongo.cosmos.azure.com:10255',
    CONNECTION_OPTIONS = 'ssl=true',
    CREDENTIAL = CosmosCredential,
    PUSHDOWN = ON
);

The difference here is in the URL: documents.azure.com for the first, mongo.cosmos.azure.com for the second. The selection of domain also changes the reported wire version.

Conclusion

In this post, we unraveled one of the mysteries of the universe. Or at least hopefully allowed somebody who does want to use PolyBase in SQL Server 2022 to connect to Cosmos DB (with MongoDB API) to do so without creating a new Cosmos DB account and collection.

2 thoughts on “PolyBase 2022 and Cosmos DB (Mongo API)

Leave a comment