In the last exciting episode, we loaded some Mars farming data into our Data Explorer pool. Today, we’re going to query that data.
Navigating to https://dataexplorer.azure.com, we can see the Synapse workspace, Data Explorer pool, and table in the Query
tab.

In order to query data, we need to use the Kusto Query Language, KQL. If you’re familiar with Splunk’s language, KQL is pretty similar. It’s just enough like SQL that it feels like you should understand it but not SQL-like enough that you’ll actually have an intuitive understanding of the language.
One quick note is that all KQL statements are case-sensitive. I personally consider this a mistake in a query language, but they didn’t ask me, I suppose. With that said, let’s get digging.
First up, I want to see how many records I have in the tibble
table. I can do that with the following command:
tibble
| count
Note that we use the pipe operator to separate statement components, sort of like a pipeline.

So yeah, I loaded 2.2 billion rows in there, no big deal. I generated one sample every 10 minutes over the course of 5 years for 9000 plots of data. In fairness, this was for a training on big data analytics and I couldn’t do that on a couple dozen rows.
Anyhow, now that we know that we do have data, let’s take a quick look at it. We’ll get the top 10 rows.
tibble
| take 10

We can see the structure of this data. I should also note that we haven’t done any type of sorting of the data, so expect it in an arbitrary ordering.
Now let’s do something a little more complicated and see how long it takes us to get all scores for plot 1406 on the day of 2119-06-04.
tibble
| where PlotID == 1406
| where SampleDateTime between (datetime(2119-06-04)..datetime(2119-06-05))

Ah, but wait! The bottom row shows 2119-06-05 at midnight, which is one more than what we want. Similarly to SQL, between
in KQL is inclusive on both sides. The solution is pretty similar as well:
tibble
| where PlotID == 1406
| where SampleDateTime >= datetime(2119-06-04)
| where SampleDateTime < datetime(2119-06-05)
If we have two conditions, we can handle exclusive date ranges.
We can see that the arability score doesn’t change drastically over the course of the day. What if we averaged out arability scores by day and bucketed over a timeframe? Let’s also introduce a couple of variables to see how we can use them and also do a bit of time math.
In order to make things a little less complicated, we’ll take it a step at a time.
let _startDate = datetime(2119-03-01);
let _endDate = datetime(2119-04-01);
tibble
| where PlotID == 1406
| extend _bin = bin_at(SampleDateTime, 1d, _startDate)
| extend _range = range(_bin, _endDate, 1d)
| where SampleDateTime >= _startDate
| where SampleDateTime < _endDate
We can see that the let
keyword defines variables and I create two here: _startDate
and _endDate
. I use those in the where
clauses at the bottom of the query to constraint our range. I also use the extend
operator to create new columns, one called _bin
and the other _range
.

The _bin
column uses bin_at()
to group each data point by timeframe, in this case, one-day intervals (1d
). The _range
column gives us a JSON array of the entire range of our dataset. We don’t actually need _range
here but it does give us an idea of one useful function. So let’s get rid of _range
and then get some summary statistics.
let _startDate = datetime(2119-03-01);
let _endDate = datetime(2119-04-01);
tibble
| where PlotID == 1406
| extend _bin = bin_at(SampleDateTime, 1d, _startDate)
| where SampleDateTime >= _startDate
| where SampleDateTime < _endDate
| summarize min(ArabilityScore), max(ArabilityScore), avg(ArabilityScore) by _bin

Okay, so it turns out that arability score doesn’t actually change all that often. But that’s okay: we can see that the process works.
One quick note is that the summarize
command needs to be after the where
clauses because summarize
makes columns like SampleDateTime
unavailable to us. In other words, here we want to filter and then group, not the other way around.
Conclusion
In today’s post, we got to see a few bits of the Kusto Query Language in action, including table selection, basic filtering, basic aggregations, and a little bit of binning. In the next post, we will dive a bit deeper into the language and see what else we could unearth (un-mars?).
One thought on “Getting Started with KQL”