This is part one in a series on applying forensic accounting techniques to SQL and R.
Before I get started, I have some disclaimers I need to get out of the way.
I am not a CPA. My wife is a CPA but I’m the one writing this so you get my interpretation, not hers. I have taken accounting courses and worked on accounting and budgeting projects, but I would have landed Al Capone in jail and probably miss important tax deductions, so don’t take accounting advice from me. Also, no CPEs for you accountants who do end up reading this…
The other disclaimer is that all of the data in this series is totally made-up. I am working off of a case study which did happen, but we are going to use some artificial data rather than the real data. This is for a couple of reasons. First, I wasn’t able to get ahold of the original data set. But even if I did, I think artificial data is better for this talk because I can pare things down to the minimum needed to understand the problem. The more realistic things get, the muddier they are and the more likely I am to lose people in the weeds. Therefore, I intend to keep things simple and a good bit easier than reality.
Who are You People and Why are You in my Living Room?
My intended audience for this series is developers and database administrators, not accountants. If you are an accountant looking to hone some data platform skills, you’ll probably pick up some stuff. If you are a developer or database administrator, my goal is to add some really cool tools to your belt.
This is NOT Just About Fraud
Before I dig into my case study, I want to make it absolutely clear that these techniques will help you do a lot more than uncover fraud in your environment. My hope is that there is no fraud going on in your environment and you never need to use these tools for that purpose.
Even with no fraud, there is an excellent reason to learn and use these tools: they help you better understand your data. A common refrain from data platform presenters is “Know your data.” I say it myself. Then we do some hand-waving stuff, give a few examples of what that entails, and go on to the main point of whatever talks we’re giving. Well, this series is dedicated to knowing your data and giving you the right tools to learn and know your data.
A Case Study with a Happy Ending
About 15 years ago, WRAL began reporting on a fraud investigation involving the Wake County Public School System in Wake County, North Carolina (which is primarily the city of Raleigh). They did some great investigative reporting and I highly recommend reading some of the articles to learn more about what happened. What follows is my summary.
Several employees for Wake County Public School Systems’ transportation department, including the then-director, conspired with Barnes Motor & Parts in a scheme involving false invoices for non-existent parts and services. The parties then collected the money from these fraudulent invoices, split it up, and used it to purchase everything from gift cards to trucks and boats. This happened over the course of a few years before investigators caught it. To this day we still don’t know exactly how much money was embezzled from Wake County taxpayers, but the government was able to claw back $5 million. Barnes Motor & Parts paid $3 million, including a $2.5 million criminal fine and $500,000 civil fine for their part in the scheme. As for county employees, several of them went to prison.
It’s good to know that the people responsible for this crime paid, but for our purposes, I think there are two questions which are interesting:
- How were they able to get away with this for so long?
- How did investigators eventually uncover this fraud?
I’ll briefly cover each of those questions with their own headers because that makes this jumble of words appear a bit more coherent from a distance.
How They Got Away With It: No Meddling Kids or Talking Dogs
One of the worst parts of this story is the complete lack of oversight or accountability in the transportation department, which allowed corruption to fester for years.
The fraud participants were able to take advantage of a rule which existed at the time: any invoice of under $2500 needed only one employee signature, whereas invoices over $2500 required two signatures. Therefore, the culprits at Barnes Motor & Parts submitted invoices under the $2500 limit so that the corrupt county employees could sign off without further oversight.
During this time, it appears that there were no internal or external audits taking place. When one finally did occur, they found discrepancies and eventually unraveled this fraud. So let’s talk a bit about how the investigators did it.
What Investigators Did
One of the key tools investigators used here was actually pretty simple: linear regression. Michael East had a breakout session at a CFO Symposium where he covered this and a few other topics. Unfortunately, the PDF with materials is no longer online (as far as I can tell) but I was able to snag the following graph from his talk:
As the number of buses increases, maintenance costs should be linear (or sub-linear if you can take advantage of economies of scale, as I think the chart really shows). But in the year with the greatest amount of fraud, maintenance costs were millions of dollars over what you would have expected. That is a red flag.
Another red flag is that there were 24 separate days in which Barnes Motor & Parts submitted more than 50 invoices, all of which happened to be under the $2500 limit. Vendors will often try to fit as much as possible onto one invoice because creating invoices is annoying and sending out multiple invoices increases the risk of some of them being lost in the process. I can see a company creating 2 or 3 invoices for separate business units or something, but it’s really hard to think of a reasonable explanation for 50+ invoices all coming in on the same day from the same vendor.
The Flow of This Series
From here on out, we’re going to re-create something similar to the Wake County problem but using artificial data. We’ll then walk through a series of techniques to learn more about our data. Yes, the exercise will be fraud detection but the skills you learn are ultimately about gaining a better understanding of your data. Even if you have zero fraud, you will still be better at your job because you understand what it means to know your data and can design solutions which better fit your specific circumstances.
In the next post, I will show how we can generate some sketchy (but not outlandish) data. Then, the next several posts in the series will cover analysis techniques. Finally, we’ll wrap up with additional techniques I don’t cover in detail but which are still potentially interesting. Stay tuned!