In today’s post, we will look at the newest Azure Synapse Analytics pool type available: the Data Explorer pool. As of the time of this post, these pools are in a public preview, so things may change between now and when they go GA.

Creating a Pool

It’s pretty straightforward to create a Data Explorer pool. In the Azure Synapse Analytics settings for your workspace, navigate to Data Explorer pools in the Analytics pools menu and you have the option to create a new pool.

Create a new Data Explorer pool.

From there, we need to provide a name and then choose workload and size. The name must be 4-22 characters and be available. As far as workloads go, you get two options: compute optimized and storage optimized. Compute optimized workloads can be 2, 4, 8, or 16 cores. Storage optimized workloads run 8 or 16 cores.

Choosing cores.

We can also choose between manual scale and optimized auto-scale. Note that the latter requires that we have between 2-8 instances. If you choose manual scale, you get a fixed number and need to scale it out yourself.

You need at least 2 instances in your Data Explorer pool.

Streaming ingestion let you load data from Event Hubs or IoT Hubs, as well as custom streaming scenarios. Purging tracks which records to delete and may be necessary for scenarios like GDPR or CCPA.

How Much Does This Cost?

Now that we’ve gone ahead and created a new pool, how much are we paying? Well, that’s where we jump to the pricing details. We pay $0.219 per vCore-hour and $23.04 per TB of data per month. With 2 vCores and 2 instances, I’m paying $0.876 per hour. That said, when you aren’t using the pool, you can stop it.

Ingesting Data into the Data Explorer Pool

We have a pool; now let’s create a database.

Establishing a base for our data.

We have a few settings here. First up, we need to name the database. Then, we set a retention period, which is how long we’ll hold data for querying. Note that this ties to the time of ingestion, not an internal log date in the data. The third option is a cache period, which is the number of days that data stays on local SSD. The you have in cache, the more expensive it will be but also the faster your queries will be.

Options!

I’ll leave everything aside from database name at its default values and move on. Note that after you create the database, it does not automatically refresh the Data Explorer pool settings, so you’ll need to select the Refresh button to perform data ingestion.

Consume log product! Then purchase new log product!

You can also create data connections from three sources: Event Hub, IoT Hub, and Blob storage. For now, I’ll choose Ingest new data to move on.

And bits and bits and bits.

First up, I need to define my new table name. I recommend choosing a better name than this.

Next up, we need to define a source type. We have a few options here:

  • From blob allows you to provide a SAS URL to a given file or folder and ingest data from there.
  • From file allows you to upload files from your machine.
  • From blob container and From ADLS Gen2 container allow you to connect to a given container and either continuously load data as you get new files in or perform a one-time upload.
  • Event Hub lets you read from an Event Hub namespace.

I’ll keep it simple and pull in arability score data from an existing data lake.

Reading some data. Be sure NOT to use “*” in your folder path if all of your files are at this level!

Editor’s note: In the image above, I tagged the folder path arabilityscore/*. Because all of my files are actually in arabilityscore/, I don’t need the asterisk and keeping this as-is will cause the upload process to think that there are no files to upload. Strike off the asterisk if you get further along and the upload process tells you there aren’t any files.

I have more than 5000 files in that container so I’m going to need to use LightIngest. Before we get to that point, I’m going to specify the data format and schema.

Setting a schema.

All of my files are semi-colon delimited values (SCSV in this parlance). Once I define the data format, I can update individual columns to provide them with appropriate names. Note that sometimes, you won’t ingest structured data. Some data might simply be lines of text. In that case, use the TXT format. There are more than a dozen separate formats available to you so something’s liable to work.

Finally, because I’m loading so many files, I’ll need to download and install LightIngest. I then will be able to run the command that it generated for me.

Ingest away!

Note that it wants you to download a .nupkg file, something you’d use in .NET code. You can, however, decompress .nupkg files with a tool like 7zip. When you execute the command, you’ll see a prompt telling you what will happen:

Proceed!

You might end up getting a 403 error:

“@message”: “Principal ‘msauser=<username>’ is not authorized to perform operation ‘Access the Kusto service’ on ‘https://{db}.cstrain.kusto.azuresynapse.net:443/’.”,

In that case, add yourself as an admin on the database running a query like the following:

.add database csdedb admins ('msauser=<Your login>') '<Your Name>'

I had to add this for myself even though I’m the only user in my subscription, so be prepared to do something similar.

Also, you might get a message that there are no blobs listed. If that’s the case, check your -prefix: command and make sure there is not a trailing asterisk. In other words, although my image above has -prefix:arabilityscore/* in the command, you might actually need -prefix:arabilityscore/ instead. Fortunately, if you run into this, it’s an easy thing to spot and fix.

Anyhow, once that’s done you should hopefully get a nice, happy output.

Ingestion in Process

This may take a while, especially if you chose the lower end of compute. Once it’s done, though, you should have some data available.

Conclusion

In this post, we looked at how to load data into an Azure Synapse Analytics Data Explorer pool. In the next post, we’ll begin querying that data with the Kusto Query Language, KQL.

2 thoughts on “Loading Azure Synapse Analytics Data Explorer Pools

Leave a comment