Upcoming Events: DataPlatformGeeks Webinar

Key Details

What:  Data Platform Geeks webinar
Where: On the Internet, next to all the other webinars.
When: Thursday, April 18th at 11:30 AM Eastern time, which happens to be 9 PM India Standard Time.
Admission is free. Sign up at the DataPlatformGeeks website.

What I’m Presenting

11:30 AM — 12:30 PM — Applying Forensic Accounting Techniques Using SQL and R

For a special bonus, I have completely revamped the talk to make it easier to follow and easier to apply in your own environment.

Course Review: Getting Started with Spark 2

This is a review of Janani Ravi’s Pluralsight course entitled Getting Started with Spark 2.

I think this course was a good length and includes a lot of quality content. Janani presents everything in Python but I was able to follow along and do similar examples in Scala. The entire course length is just over 2 hours, which I think is fine for a “getting started” type of course.

This course contains a great overview of Spark 2 and how it differs from Spark 1, including the expansion of DataFrames, the combination of Spark contexts into a single SparkSession, and the Tungsten and Catalyst engines. Those two, in particular, were fascinating topics. I wish Janini had a chance to dig into them further, though I suppose that probably would not have been in the scope of a single course.

The use of Jupyter Notebooks for demos was a smart idea. Spark doesn’t have a “natural” UI and Jupyter lets you interact smoothly with the Python kernel and Spark cluster. I’m more used to Apache Zeppelin for Spark notebooks, so it was nice to see Jupyter in use here.

The one thing I regret about the course is the overuse of SnagIt-style boxes and circles. Showing something with a rectangle makes some sense when there’s a lot on the screen and you need to direct the user to a particular element. I think there was an over-use of this functionality, as we would regularly see Janini type something in and then immediately see a rectangle or circle around it. I think this was overkill and was more distracting than illuminating.

Despite that nitpick, if you are in the market for some introductory content on Spark and don’t mind working in Python, this is a great course to review.

Upcoming Events: Southern Fried F#

Key Details

What: Southern Fried F#
Where: Red Hat, 100 East Davie Street, Raleigh, NC 27601
When: Saturday, April 13th, all day
Admission is free. Sign up on Eventbrite.

The tastiest conference logo

What I’m Presenting

Nothing at all!

I’m one of the organizers for the inaugural edition of Southern Fried F#. Our goal is to expand functional programming in the Triangle area and we have a great lineup of speakers and topics.

Azure Data Studio Notebooks: GitHub Issues

There are a couple of Azure Data Studio GitHub issues near and dear to my heart that I’d like to point out in the hopes that it’ll raise their profile and give the Azure Data Studio product team more reason to make my life easier.

Command Mode

The first issue I’d love to see addressed is to have a command mode, which Jupyter Notebooks already includes. Having a command mode makes life easier because it allows us to have:

Jupyter Notebook Shortcuts

The other issue I’d love to see addressed is to add Jupyter notebook shortcuts. This includes edit mode shortcuts as well as command mode shortcuts.

Forensic Accounting: Generating Sketchy (But Not Too Sketchy) Data

This is part two in a series on applying forensic accounting techniques to SQL and R.

Start with a Data Model

My plan here is to create a model which is a simple version of the Wake County transportation fraud case. To do so, I’ve created the simple data model which we see below:

The forensic accounting database diagram courtesy of SQLdbm.

As a quick note, I include a Calendar table to make it easier for me to generate data. I’ll also use it for some queries later on in the series.

Here’s the gist: we have employees at our organization. We own and operate buses. Buses come into service on a particular date and they eventually get retired. During their operational window, we need to perform maintenance on the buses and buy new parts. We purchase these parts from vendors and the parts fit into expense categories. Each vendor sells products or services from at least one category and we can only buy parts where we have established a link between vendor and category.

All of these are simple tables and way easier than reality—for example, we have categories of products but no specific products, no variants, etc. If you came to this series looking for a complete data model covering vehicle transportation and maintenance, you’ll be sorely disappointed.

Add Data and Stir

The next thing we need to do is fill in this data. I’m not going to include every static row here, but I will mention how I load each table.

Static Tables

  • dbo.Employee: I picked 12 names and added them to the table.
  • dbo.ExpenseCategory: I have 28 categories. 3 of them I made up myself and the other 25 I got from a website which sells bus parts and supplies. #Research
  • dbo.Vendor: I created 15 phony vendors and added them to the table.
  • dbo.VendorExpenseCategory: For each of the 15 vendors, I picked expense categories which appeared to fit. For example, a vendor named Bus Repair Shack (a classy organization, I’m sure) performs vehicle repair and vehicle maintenance but doesn’t sell any parts directly.

That leaves just two tables, but they’re doozies.

Bus

We’ll start with the bus table. Our agency started at the beginning of 2011 with an allotment of 300 decrepit buses:

INSERT INTO dbo.Bus
(
	BusID,
	DateFirstInService,
	DateRetired
)
SELECT TOP (300)
	ROW_NUMBER() OVER (ORDER BY NEWID()) AS BusID,
	'1990-01-01' AS DateFirstInService,
	NULL AS DateRetired
FROM dbo.Calendar;

We add 50 shiny new buses each year:

INSERT INTO dbo.Bus
(
	BusID,
	DateFirstInService,
	DateRetired
)
SELECT
	300 + ROW_NUMBER() OVER (ORDER BY c.Date, ao.object_id),
	c.Date,
	NULL AS DateRetired
FROM dbo.Calendar c
	CROSS JOIN ( SELECT TOP(50) object_id FROM sys.all_objects ) ao
WHERE
	c.CalendarDayOfYear = 1
	AND c.Date < '2019-01-01';

Then we retire approximately 36 beat-up hunks of junk on the first day of each year:

WITH candidates AS
(
	SELECT
		ROW_NUMBER() OVER (PARTITION BY c.Date ORDER BY NEWID()) AS rownum,
		c.Date,
		b.BusID,
		b.DateFirstInService,
		b.DateRetired
	FROM dbo.Calendar c
		INNER JOIN dbo.Bus b
			ON c.Date > b.DateFirstInService
	WHERE
		c.CalendarDayOfYear = 1
		AND c.Date < '2019-01-01'
)
UPDATE b
SET
	DateRetired = c.Date
FROM candidates c
	INNER JOIN dbo.Bus b
		ON c.BusID = b.BusID
WHERE
	c.rownum <= 36;

This means that, on net each year, we add somewhere around 14 new buses.

Line Items — Legit Data

This is where stuff gets crazy. First, I created a table named #ValuePerCategory, which has the mean price and the price standard deviation for each expense category. To get this information, I trawled through the catalog and picked reasonable-enough values for each of the categories. This is my level of commitment to getting things right(ish). The standard deviations, though, I just made up. I didn’t look at huge numbers of products and calculate these values myself. That’s the limit of my commitment to excellence and why I don’t have a giant banner on my stadium.

With this temp table in place, I created a new temporary table for line items and loaded it with a heap of data:

DECLARE
	@InclusionThreshold DECIMAL(7,6) = 0.00067,
	@Precision INT = 2;

INSERT INTO #LineItem
(
	BusID,
	VendorID,
	ExpenseCategoryID,
	EmployeeID,
	LineItemDate,
	Amount
)
SELECT
	b.BusID,
	vec.VendorID,
	vec.ExpenseCategoryID,
	r.EmployeeID,
	c.Date,
	ABS(s.Amount) AS Amount
FROM dbo.Calendar c
	CROSS JOIN dbo.Bus b
	CROSS JOIN dbo.VendorExpenseCategory vec
	INNER JOIN #ValuePerCategory vpc
		ON vec.ExpenseCategoryID = vpc.ExpenseCategoryID
	CROSS APPLY (
		SELECT
			RAND(CHECKSUM(NEWID())) AS rand1,
			RAND(CHECKSUM(NEWID())) AS rand2,
			FLOOR(RAND(CHECKSUM(NEWID())) * (12 - 1 + 1)) + 1 AS EmployeeID,
			RAND(CHECKSUM(NEWID())) AS InclusionThreshold
	) r
	CROSS APPLY (
		SELECT
			ROUND((SQRT(-2.0 * LOG(r.rand1)) * COS(2 * PI() * r.rand2)) * vpc.StdDevPrice, @Precision) + vpc.MeanPrice AS Amount
	) s
WHERE
	c.Date >= '2011-01-01'
	AND c.Date < '2019-01-01'
	AND c.IsWeekend = 0
	AND c.HolidayName IS NULL
	AND b.DateFirstInService <= c.Date
	AND ISNULL(b.DateRetired, '9999-12-31') > c.Date
	AND r.InclusionThreshold < @InclusionThreshold;

Basically, we generate one row per combination of calendar day, bus, and vendor expense category. We then assume that prices follow a normal distribution and choose an amount randomly from a normal distribution defined by the mean price and price standard deviation for that expense category. We only make purchases on non-holiday weekdays (these are government employees after all) for buses which are in service. Finally, the inclusion threshold is a random number from 0-1 using RAND() to give us a uniform distribution. If the resulting random value is below our inclusion threshold, we include the value in our sample. I played around with numbers until I landed on 0.067% of rows. Even with that low a threshold, we still ended up with tens of thousands of entries, which is plenty for what we need.

The nice thing about following normal distributions and having a large number of rows is that there will be oddities even without introducing fraud. I want to make it clear that oddities are not in themselves proof of fraud. Assuming you pull randomly from a normal distribution, you can expect approximately 3 in 1000 rows to be greater than three standard deviations from the mean. Real-life distributions aren’t always normal, so even that rule of thumb has its limits. But to the extent that distributions aren’t normal, they tend to increase the number of seeming outliers, not decrease.

Line Items — Fraudulent Data

Now we want to introduce some fraud. Some people decided to collude with one of our vendors. They started their fraud on February 9th, 2018 and ended their fraud on October 14th, 2018. They made sure to limit purchases to no more than $999.99 because our cutoff for requiring two signatures is $1000. On each day, there is a very small chance of a fraudulent invoice per combination of calendar day, bus, and valid expense category: 0.0012%.

SET @InclusionThreshold = 0.000012;
INSERT INTO #LineItem
(
	BusID,
	VendorID,
	ExpenseCategoryID,
	EmployeeID,
	LineItemDate,
	Amount
)
SELECT
	b.BusID,
	vec.VendorID,
	vec.ExpenseCategoryID,
	CASE r.EmployeeID
		WHEN 1 THEN 4
		WHEN 2 THEN 8
		WHEN 3 THEN 10
		WHEN 4 THEN 12
		ELSE 4
	END AS EmployeeID,
	c.Date,
	CASE
		WHEN ABS(s.Amount) >= 1000 THEN 999.99
		ELSE ABS(s.Amount)
	END AS Amount
FROM dbo.Calendar c
	CROSS JOIN dbo.Bus b
	CROSS JOIN dbo.VendorExpenseCategory vec
	-- This is to simulate multiple entries
	CROSS JOIN dbo.Bus b_simmult
	INNER JOIN #ValuePerCategory vpc
		ON vec.ExpenseCategoryID = vpc.ExpenseCategoryID
	CROSS APPLY (
		SELECT
			RAND(CHECKSUM(NEWID())) AS rand1,
			RAND(CHECKSUM(NEWID())) AS rand2,
			FLOOR(RAND(CHECKSUM(NEWID())) * (4 - 1 + 1)) + 1 AS EmployeeID,
			RAND(CHECKSUM(NEWID())) AS InclusionThreshold
	) r
	CROSS APPLY (
		SELECT
			ROUND((SQRT(-2.0 * LOG(r.rand1)) * COS(2 * PI() * r.rand2)) * (vpc.StdDevPrice * 2), @Precision) + (vpc.MeanPrice * 5) AS Amount
	) s
WHERE
	c.Date >= '2018-02-09'
	AND c.Date < '2018-10-14'
	AND c.IsWeekend = 0
	AND c.HolidayName IS NULL
	AND b.DateFirstInService <= c.Date
	AND ISNULL(b.DateRetired, '9999-12-31') > c.Date
	AND vec.VendorID = 5
	AND r.InclusionThreshold < @InclusionThreshold;

INSERT INTO dbo.LineItem 
(
	BusID,
	VendorID,
	ExpenseCategoryID,
	EmployeeID,
	LineItemDate,
	Amount
)
SELECT
	li.BusID,
	li.VendorID,
	li.ExpenseCategoryID,
	li.EmployeeID,
	li.LineItemDate,
	li.Amount
FROM #LineItem li
ORDER BY
	li.LineItemDate,
	li.BusID,
	li.VendorID,
	li.ExpenseCategoryID,
	li.Amount DESC;

Still, we do add in a pretty large number of invoices, enough to make some noise.

Line Item — Missing Invoices

Our late CFO (who got hit by a bus while listening to Alanis Morisette) kinda-sorta accidentally(?) deleted some line items from our database. Nobody knows what was in there but we can say the late CFO was wearing some very expensive suits for someone making county government money.

DELETE
FROM dbo.LineItem
WHERE LineItemID IN (6, 7, 14, 20, 199, 200, 201, 202, 339, 340, 341);

Conclusion

Before digging into the sundry techniques around forensic accounting, I wanted to make sure we had a good understanding of our data set, as I’ll use it consistently through the series. I did kind of give away the game with this post, but as we walk through techniques, we’ll be able to see if we find the problem or if our culprits end up hiding in the noise.

Forensic Accounting: Introduction and Case Study

This is part one in a series on applying forensic accounting techniques to SQL and R.

Necessary Disclaimers

Before I get started, I have some disclaimers I need to get out of the way.

I am not a CPA. My wife is a CPA but I’m the one writing this so you get my interpretation, not hers. I have taken accounting courses and worked on accounting and budgeting projects, but I would have landed Al Capone in jail and probably miss important tax deductions, so don’t take accounting advice from me. Also, no CPEs for you accountants who do end up reading this…

The other disclaimer is that all of the data in this series is totally made-up. I am working off of a case study which did happen, but we are going to use some artificial data rather than the real data. This is for a couple of reasons. First, I wasn’t able to get ahold of the original data set. But even if I did, I think artificial data is better for this talk because I can pare things down to the minimum needed to understand the problem. The more realistic things get, the muddier they are and the more likely I am to lose people in the weeds. Therefore, I intend to keep things simple and a good bit easier than reality.

Who are You People and Why are You in my Living Room?

My intended audience for this series is developers and database administrators, not accountants. If you are an accountant looking to hone some data platform skills, you’ll probably pick up some stuff. If you are a developer or database administrator, my goal is to add some really cool tools to your belt.

This is NOT Just About Fraud

Before I dig into my case study, I want to make it absolutely clear that these techniques will help you do a lot more than uncover fraud in your environment. My hope is that there is no fraud going on in your environment and you never need to use these tools for that purpose.

Even with no fraud, there is an excellent reason to learn and use these tools: they help you better understand your data. A common refrain from data platform presenters is “Know your data.” I say it myself. Then we do some hand-waving stuff, give a few examples of what that entails, and go on to the main point of whatever talks we’re giving. Well, this series is dedicated to knowing your data and giving you the right tools to learn and know your data.

A Case Study with a Happy Ending

About 15 years ago, WRAL began reporting on a fraud investigation involving the Wake County Public School System in Wake County, North Carolina (which is primarily the city of Raleigh). They did some great investigative reporting and I highly recommend reading some of the articles to learn more about what happened. What follows is my summary.

Several employees for Wake County Public School Systems’ transportation department, including the then-director, conspired with Barnes Motor & Parts in a scheme involving false invoices for non-existent parts and services. The parties then collected the money from these fraudulent invoices, split it up, and used it to purchase everything from gift cards to trucks and boats. This happened over the course of a few years before investigators caught it. To this day we still don’t know exactly how much money was embezzled from Wake County taxpayers, but the government was able to claw back $5 million. Barnes Motor & Parts paid $3 million, including a $2.5 million criminal fine and $500,000 civil fine for their part in the scheme. As for county employees, several of them went to prison.

It’s good to know that the people responsible for this crime paid, but for our purposes, I think there are two questions which are interesting:

  1. How were they able to get away with this for so long?
  2. How did investigators eventually uncover this fraud?

I’ll briefly cover each of those questions with their own headers because that makes this jumble of words appear a bit more coherent from a distance.

How They Got Away With It: No Meddling Kids or Talking Dogs

One of the worst parts of this story is the complete lack of oversight or accountability in the transportation department, which allowed corruption to fester for years.

The fraud participants were able to take advantage of a rule which existed at the time: any invoice of under $2500 needed only one employee signature, whereas invoices over $2500 required two signatures. Therefore, the culprits at Barnes Motor & Parts submitted invoices under the $2500 limit so that the corrupt county employees could sign off without further oversight.

During this time, it appears that there were no internal or external audits taking place. When one finally did occur, they found discrepancies and eventually unraveled this fraud. So let’s talk a bit about how the investigators did it.

What Investigators Did

One of the key tools investigators used here was actually pretty simple: linear regression. Michael East had a breakout session at a CFO Symposium where he covered this and a few other topics. Unfortunately, the PDF with materials is no longer online (as far as I can tell) but I was able to snag the following graph from his talk:

One of these dots is not like the others.

As the number of buses increases, maintenance costs should be linear (or sub-linear if you can take advantage of economies of scale, as I think the chart really shows). But in the year with the greatest amount of fraud, maintenance costs were millions of dollars over what you would have expected. That is a red flag.

Another red flag is that there were 24 separate days in which Barnes Motor & Parts submitted more than 50 invoices, all of which happened to be under the $2500 limit. Vendors will often try to fit as much as possible onto one invoice because creating invoices is annoying and sending out multiple invoices increases the risk of some of them being lost in the process. I can see a company creating 2 or 3 invoices for separate business units or something, but it’s really hard to think of a reasonable explanation for 50+ invoices all coming in on the same day from the same vendor.

The Flow of This Series

From here on out, we’re going to re-create something similar to the Wake County problem but using artificial data. We’ll then walk through a series of techniques to learn more about our data. Yes, the exercise will be fraud detection but the skills you learn are ultimately about gaining a better understanding of your data. Even if you have zero fraud, you will still be better at your job because you understand what it means to know your data and can design solutions which better fit your specific circumstances.

In the next post, I will show how we can generate some sketchy (but not outlandish) data. Then, the next several posts in the series will cover analysis techniques. Finally, we’ll wrap up with additional techniques I don’t cover in detail but which are still potentially interesting. Stay tuned!

Upcoming Events: SQL Saturday Madison

Key Details

What: SQL Saturday Madison
Where: 1000 Edgewood College Drive, Madison
When: Saturday, April 6th, all day
Admission is free. Sign up at the SQL Saturday website.

What I’m Presenting

11:00 AM — 12:00 PM — APPLY Yourself
02:45 PM — 03:45 PM — Eyes on the Prize: Simple and Effective Dashboard Visualization Techniques

SQL Saturday Madison is in a new venue this year, so don’t book your hotel thinking it’s where it was last year and end up a few miles away from the new spot. And especially don’t do it as a non-refundable booking. Reading is fundamental.

PolyBase Revealed: the DW Databases

Today is a fairly short post covering a trio of databases you might not even know you have: DWConfiguration, DWDiagnostics, and DWQueue. The PolyBase installer drops all three of these on your instance. Let’s go in ascending order of the number of useful tables.

DWQueue

The DWQueue database has two tables, neither of which I’ve ever seen populated in an on-prem SQL Server instance. It is probably useful in some internal system context and I have seen a procedure named dbo.MessageQueueReceived get called, along with an extended stored procedure whose name I was unable to find. Don’t look for enlightenment here.

DWDiagnostics

The DWDiagnostics database has a dozen tables and if you look at row counts, most of them have data. But not all of that data is directly useful to us.

The most useful table here is dbo.pdw_errors, which contains error information and messages for each error which has occurred on our system. If you’re used to the standard PolyBase error messages (which are a mess), you won’t get too much more info than what you see in them, but if you’re monitoring a busy system or one which swallows errors, at least you can see what’s going wrong.

One of the fun things about these tables is that it looks like PolyBase pre-allocates rows. For example, I have 10,000 rows in dbo.pdw_errors but only 175 of them have my machine name; the rest have a machine name of N and fake data for everything, as well as a DateTimePublished value which was probably when I originally installed PolyBase.

There are also 10,000 rows in dbo.pdw_component_alerts_data and dbo.pdw_os_event_logs but those two appear to do nothing for on-prem SQL Server running the PolyBase engine. The other tables aren’t worth mentioning.

DWConfiguration

The DWConfiguration database has two potentially interesting tables, which is twice as many as its nearest competitor.

First up is dbo.configuration_properties, which looks to be the values PolyBase uses for costing Hadoop operations, setting pdw_user as the default username, and so on. I might mess with some of this on a VM one of these days just to see what happens.

The other table is dbo.distribution, which lays out distributions. There are 8 rows in this table, which corresponds to the number of files created in most Hadoop write operations. A wild guess says there are 60 rows on Azure SQL Data Warehouse, but that’s mere speculation.

Ignore dbo.pdw_sp_configure as that, like Europa, is not meant for you and you should attempt no landing.

Conclusion

These databases seem like they’re primarily for Microsoft’s Analytics Platform System or Azure SQL Data Warehouse. There are a couple of tables which are interesting to us for on-prem PolyBase usage but not many. The important thing, though, is that even though these tables aren’t interesting to us, messing with them can have deleterious consequences for your SQL Server instance, so just leave them be.

WhoIsActive In SSMS

Several years ago now, I put together a post on helpful wrapper procedures for calling sp_WhoIsActive. Since then, I’ve taken to three sp_whoisactive calls as SSMS keyboard shortcuts.

Ctrl + F1: What’s Going On?

Here is my bread and butter query in a format you can easily copy and paste:

EXEC sp_whoisactive @get_full_inner_text = 1, @get_outer_command = 1, @find_block_leaders = 1, @get_task_info = 2, @get_transaction_info = 1, @get_plans = 0, @output_column_list = '[session_id][login_name][dd%][CPU][reads][writes][wait_info][blocking_session_id][blocked_session_count][sql_text][sql_command][status][tempdb_%][%]', @sort_order = '[session_id]';

And here it is in a little bit nicer of a format so we can cover it:

EXEC sp_whoisactive
	@get_full_inner_text = 1,
	@get_outer_command = 1,
	@find_block_leaders = 1,
	@get_task_info = 2,
	@get_transaction_info = 1,
	@get_plans = 0,
	@output_column_list = '[session_id][login_name][dd%][CPU][reads][writes][wait_info][blocking_session_id][blocked_session_count][sql_text][sql_command][status][tempdb_%][%]',
	@sort_order = '[session_id]';

This call is typically very fast. There are cases where sp_whoisactive can be slow, such as pulling back execution plans. In scenarios where I just want to see what’s going on with a minimum of fuss, I run this command. I’ve ordered the list so that I can see some of the most important columns for me up-front: how long queries have been running, their total resource utilization, reasons for waits, and blocking details. Getting this info, as well as the command, without scrolling is nice when troubleshooting.

Ctrl + 3 For Queries From Me

This second command focuses on my queries. This is useful when looking at busy servers with a lot of activity going and where it might be difficult to figure out which session is actually yours. Here is the copy-pastable block:

EXEC sp_whoisactive @filter_type = 'login', @filter = '{{YOURNAMEHERE}}', @get_full_inner_text = 1, @get_outer_command = 1, @find_block_leaders = 1, @get_task_info = 2, @get_transaction_info = 1, @get_plans = 0, @output_column_list = '[session_id][login_name][dd%][CPU][reads][writes][wait_info][blocking_session_id][blocked_session_count][sql_text][sql_command][status][tempdb_%][%]', @sort_order = '[session_id]';

And the block which humans can read:

EXEC sp_whoisactive
	@filter_type = 'login',
	@filter = '{{YOURNAMEHERE}}',
	@get_full_inner_text = 1,
	@get_outer_command = 1,
	@find_block_leaders = 1,
	@get_task_info = 2,
	@get_transaction_info = 1,
	@get_plans = 0,
	@output_column_list = '[session_id][login_name][dd%][CPU][reads][writes][wait_info][blocking_session_id][blocked_session_count][sql_text][sql_command][status][tempdb_%][%]',
	@sort_order = '[session_id]';

The column set is the same as before but now I’m filtering to my login. As a quick note, you’ll obviously want to replace {{YOURNAMEHERE}} with your login.

Ctrl + 4 For Even More

When I need to get execution plans, that’s when I break out Ctrl+4. Here is the one-liner:

EXEC sp_whoisactive @get_full_inner_text = 1, @get_outer_command = 1, @find_block_leaders = 1, @get_task_info = 2, @get_transaction_info = 1, @get_plans = 1, @output_column_list = '[session_id][login_name][dd%][CPU][reads][writes][wait_info][blocking_session_id][blocked_session_count][sql_text][sql_command][status][tempdb_%][%]', @sort_order = '[session_id]';

And the full query:

EXEC sp_whoisactive
	@get_full_inner_text = 1,
	@get_outer_command = 1,
	@find_block_leaders = 1,
	@get_task_info = 2,
	@get_transaction_info = 1,
	@get_plans = 1,
	@output_column_list = '[session_id][login_name][dd%][CPU][reads][writes][wait_info][blocking_session_id][blocked_session_count][sql_text][sql_command][status][tempdb_%][%]',
	@sort_order = '[session_id]';

The big difference here is the addition of @get_plans = 1, which can make the query slower on busy systems. As a result, this is one I’m less likely to run than the others. Still, when you need an execution plan, this is a valuable query to run.

Conclusion: I Ran out of Mnemonics

Those are three of the ways that I like using sp_whoisactive in day-to-day database development and query tuning work. This is one of the most flexible stored procedures out there, but that flexibility can seem overwhelming at first. Hopefully this post gives you a starting point to build out your own helper queries.