Forensic Accounting: Numeral Analysis

This is part six in a series on applying forensic accounting techniques to SQL and R.

What the Numbers Tell Us

In this last section of analysis, we’re going to look at three tells for finding potentially fraudulent data: round number analysis, last-digit analysis, and first-digit analysis with Benford’s Law.

Round Number Analysis

Round number analysis focuses on the final digits of a number, specifically looking at how many 0s there are at the end after rounding off the decimal. We’re interested in the percentage of records in each rank (or Type). For example, $10,820 is a Type 1 because there is a single 0 at the end. $8,300 is a Type 2 because there are two 0s at the end. $11,000 is a Type 3. And $9009 is a Type 0 because there are no 0s at the end.

With that in mind, how do we do this kind of calculation in SQL Server? With one of my favorite operators: APPLY. Here is a query to get the breakdown of ranks by vendor in our generated data set:

%%sql
WITH records AS
(
	SELECT
		v.VendorName,
		a.RoundedAmount
	FROM dbo.LineItem li
		INNER JOIN dbo.Vendor v
			ON li.VendorID = v.VendorID
		CROSS APPLY
		(
			SELECT
				ROUND(li.Amount, 0) AS RoundedAmount
		) a
	WHERE
		a.RoundedAmount > 0
)
SELECT
	r.VendorName,
	SUM(t4.IsType4) AS Type4,
	SUM(t3.IsType3) AS Type3,
	SUM(t2.IsType2) AS Type2,
	SUM(t1.IsType1) AS Type1,
	SUM(t0.IsType0) AS Type0,
	COUNT(1) AS NumberOfInvoices,
	CAST(100.0 * SUM(t0.IsType0) / COUNT(1) AS DECIMAL(5,2)) AS PercentType0
FROM records r
	CROSS APPLY(SELECT CASE WHEN r.RoundedAmount % 10000 = 0 THEN 1 ELSE 0 END AS IsType4) t4
	CROSS APPLY(SELECT CASE WHEN t4.IsType4 = 0 AND r.RoundedAmount % 1000 = 0 THEN 1 ELSE 0 END AS IsType3) t3
	CROSS APPLY(SELECT CASE WHEN t3.IsType3 = 0 AND r.RoundedAmount % 100 = 0 THEN 1 ELSE 0 END AS IsType2) t2
	CROSS APPLY(SELECT CASE WHEN t2.IsType2 = 0 AND r.RoundedAmount % 10 = 0 THEN 1 ELSE 0 END AS IsType1) t1
	CROSS APPLY(SELECT CASE WHEN t4.IsType4 = 0 AND t3.IsType3 = 0 AND t2.IsType2 = 0 AND t1.IsType1 = 0 THEN 1 ELSE 0 END AS IsType0) t0
GROUP BY
	r.VendorName
ORDER BY
	PercentType0 DESC;

Let’s talk through this code. First, we have a common table expression, records. We get the vendor name and rounded amount of each invoice, using CROSS APPLY to “hide” the rounding calculation. In this case, the CROSS APPLY is probably overkill but for more complex scenarios, it can make life easier by letting us see what the author intended in the SELECT clause rather than piecing through calculations.

SELECT
	v.VendorName,
	a.RoundedAmount
FROM dbo.LineItem li
	INNER JOIN dbo.Vendor v
		ON li.VendorID = v.VendorID
	CROSS APPLY
	(
		SELECT
			ROUND(li.Amount, 0) AS RoundedAmount
	) a
WHERE
	a.RoundedAmount > 0

From there, we take our records CTE and check to see if there are any Type 4 (or higher) results:

FROM records r
	CROSS APPLY(SELECT CASE WHEN r.RoundedAmount % 10000 = 0 THEN 1 ELSE 0 END AS IsType4) t4

In other words, if RoundedAmount divides cleanly by 10,000 then we have a Type 4 or higher. We’ll call that boolean flag IsType4 and alias the function as t4.

Next, we want to chain another function to our result, this time seeing if something is both not a Type 4 but still divisible cleanly by 1000. If so, this is a Type 3.

FROM records r
	CROSS APPLY(SELECT CASE WHEN r.RoundedAmount % 10000 = 0 THEN 1 ELSE 0 END AS IsType4) t4
	CROSS APPLY(SELECT CASE WHEN t4.IsType4 = 0 AND r.RoundedAmount % 1000 = 0 THEN 1 ELSE 0 END AS IsType3) t3

We continue to do the same type of operation for t2 and t1, and any results which are not Type 1-4+ must therefore be Type 0.

The Value of Round Number Analysis

Okay, so why do we care about the number of round numbers? Because we might be able to find a result like the following:

VendorNameType4Type3Type2Type1Type0NumberOfInvoicesPercentType0
Bus Repair Shack01107489998391.45
The Longevity Crew0044145850391.05
Clean Sweep Cleaning Supplies00102422548280091.00
Fuel Suppliers, Ltd0004645049690.73
Doodads and Thingies, Inc.01222933009332490.52
VehiCo Parts and Accessories00232963044336390.51
Tony’s Fixer Upper Buses00253353388374890.39
Electronics and Repairs00181681738192490.33
Safety First00235365219577890.33
Comfort Rider, GmbH00202542538281290.26
Mobility Accessories R Us0091731685186790.25
Engine Mates00283193038338589.75
Fuel Associates, Unlimited0025144649989.38
Tony’s Bus Fixer Uppers00171451333149589.16
Glass and Sons Glass and Accessories0411157673612439482.20

One of these customers looks suspicious. It’s the one with 411 Type 3 values. In a normal data set (where “normal” generally means “not a budget”), about 90% of your results will be Type 0, about 9% will be Type 1, about 0.9% will be Type 2, and so on.

In other words, in most cases:

Last Digit Analysis

The rule of thumb for last digit analysis is that the last digit of each number in a set of numbers will likely follow the uniform distribution. In other words, there is no discernable pattern.

For example, here is a column chart of the distribution of the last digit of the population of towns and cities in North Carolina as of 2013:

Ignore the line. It was not meant for you. Yet.

You can see a pretty flat spread along the whole of the North Carolina population data set. So how about our fraudulent data?

As flat as Buzz Aldrin’s hair.

This distribution is quite uniform as well. Incidentally, if you recall back to the second part of this series, we generated all of our “real” and “fraudulent” data using normal distributions. With normally distributed data, the last digit follows a uniform distribution.

When Does This Not Hold?

There are a couple of situations where you can expect the last digit not to be uniform:

  1. Budgets are heavily skewed toward 0. This makes sense because budgets are estimates.
  2. Approximations in measurements tend to be biased toward 0 and 5.
  3. People fudging data tend to bias away from 0 and 5 because they’re not “random enough.”

Now, there has also been some researching arguing that uniformity is less frequent than we might think, but I haven’t read that article.

Benford’s Law

I have a longer post on Benford’s Law already, so I’d recommend reading that.

If you’re curious about our generated data, here’s what it looks like, including with the fraudulent data:

Benford strikes again.

And here’s what it looks like for just 2018, the year of fraudulent data:

All 9, all the time.

There’s a bit of a difference to be sure, and that might spur on additional analysis.

Conclusion

In today’s post, I wanted to look at a few analytical techniques around distributions of numbers. They are useful in fraud detection mostly because people are terrible random number generators, and so these techniques can be useful in seeing the potential of bias. They won’t typically prove bias, but can be a good first step.

Advertisements

One thought on “Forensic Accounting: Numeral Analysis

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s