This is part three of a series on dashboard visualization.
Knowing your data can matter as much as knowing your audience when it comes to creating good dashboards. There are several critical questions to ask when it comes to the data:
- What are the most important measures people need to see?
- How do my data elements relate to those measures?
- Am I missing any data, or are there incomplete elements?
- Is my data source any good? What is the likelihood that my data is wrong?
Let’s dig into each a little bit.
Finding The Right Measures
Your dashboard’s users are going to want answers to particular questions. Knowing those questions and what they’re expecting is critical to building a good dashboard. And that doesn’t mean coming to them with a data dictionary and expecting them to put all the pieces together.
So where do we get these measures? The first avenue is discussion. Figure out what they’re always looking at. What are the data sources they bring in? What kinds of decisions do they make? What causes them to make those decisions?
From there, look at the kinds of reports people use. Reports and dashboards serve different purposes and have separate sets of rules around how you can best visualize data, but those long-form reports can commonly include interesting metrics.
Another place to look at are alerting systems. If there’s an alert, you have a measure with some threshold which is important enough that people are willing to get messages about it. That measure might be interesting to visualize, whether it be a historical trend or a point-in-time calculation.
Going From Data To Measures
You might not have all of the information in your data to answer their questions. This is where it starts to get interesting, and where a tool like Power BI can be great: if your users are getting data from a source like an API, webpage, spreadsheet, CSV file, or anything with a fairly consistent structure, you can use Power BI to import that data and merge it in with what you do have. This isn’t a great strategy if there’s a big change in granularity or complex rules around how things tie together, but the functionality is there for those times when it is useful.
A lot of this comes down to finding and combining data from disparate sources. Maybe you’ll get lucky and someone built a great data warehouse which holds all of these measures. But then you wake up from that dream and get back to working on gnarly ETL processes.
A common issue with data is that we’re missing pieces. This could mean missing values in certain fields, corroborating data which doesn’t exist in a second data source, or missing rows. In some of these cases, we can ignore missing values or slap a default on it. For example, if I have a small percentage of API calls return with no response time, I could set those response times to the median response time of all calls in the sample. That way, I don’t bias the data set and don’t have to throw away missing rows. But if half of the sample is missing response times, then this technique no longer works and I’m better off ditching the incomplete data.
How Good Is My Data Source?
The last question I want to cover is how good the data source itself is. Within a company, there are data sources of better and worse quality. Some of the lower-quality sources have problems with missing data, but another avenue of concern is, how was the data collected? Data collected from a survey does not have the same level of quality as data collected from sampling a process, and we should treat them separately.
If you are using external data, how good is that quality? If your users start picking out errors in the data, you run the risk of users no longer trusting your system. That means you’ll want to perform some validation of the quality of external data.