The Importance of 0 in Regressions

Mala Mahadevan (b | t) brought an article to my attention regarding a regression between elevation and introversion/extroversion by state from a few years back. Before I get into this, I want to note that I haven’t read the linked journal article and am not casting aspersions at the blog post author or the journal article authors, but this was a good learning opportunity for an important concept.

Here is the original image:

I see a line. Definitely, definitely a line.

So boom, extroverts want to live on flat land and introverts in the mountains. Except that there are a few problems with this interpretation. Let’s go through them. I’ll focus entirely on interpreting this regression and try to avoid getting down any tangential rabbit holes…though knowing me, I’ll probably end up in one or two.

The Line is NOT the Data

One of the worst things we can do as data analysts is to interpret a regression line as the most important thing on a visual. The important thing here is the per-state set of data points, but our eyes are drawn to the line. The line mentally replaces the data, but in doing so, we lose the noise. And boy, is there a lot of noise.

Boy, is There a Lot of Noise

I don’t have the raw values but I think I can fake it well enough here to explain my point. If you look at a given elevation difference, there are some huge swings in values. For example, check out the four boxes I drew:

Mastery of boxes? Check.

On the left-most box, approximately the same elevation difference relates to ranges from roughly -0.6 to 1.8 or so. Considering that our actual data ranges from -2 to approximately 3, we’re talking about a huge slice. The second box spans the two extremes. The third and fourth boxes also take up well over half the available space.

This takes us to a problem with the thin line:

The Thin Black Line

When we draw a regression line, we typically draw a thin line to avoid overwhelming the visual. The downside to this is that it implies a level of precision which the data won’t support. We don’t see states clustered around this thin line; they’re all around it. Incorporating the variance in NEO E zscore for a given elevation difference, we have something which looks more like this:

That’s a thick line.

Mind you, I don’t have the actual numbers so I’m not drawing a proper confidence interval. I think it’d be pretty close to this, though, just from eyeballing the data and recognizing the paucity of data points.

So what’s the problem here? The lines are all pointing in the same direction, so there’s definitely a relationship…right?

Zeroing in on the Problem

Looking at the vertical axis, we have a score which runs from -2 to 3(ish), where negative numbers mean introversion and positive numbers extroversion. That makes 0 the midpoint where people are neither introverted nor extroverted. This is important because we want to show not only that this relationship is negative, but that it is meaningful. A quick and dirty way we can check this is to see how much of our confidence interval is outside the zero line. After all, we’re trying to interpret this as “people who live in higher-elevation areas tend to be more introverted.”

The answer? Not much.

With our fat confidence interval guess, the confidence interval for all 50 states (plus one swamp) includes the 0 line, meaning that even though we can draw a line pointing downward, we can’t conclusively say that there is any sort of relationship between introversion/extroversion and differences in elevation because both answers are within our realm of possibility for the entire range of the visual.

But hey, maybe I’m way off on my confidence interval guess. Let’s tighten it up quite a bit and shrink it roughly in half. That gives us an image which looks like this:

Conclusive evidence that Alaskans are introverts.

If I cut that confidence interval roughly in half, I lose enough states that those CI bars probably are too narrow. Conclusions we can draw include:

  • Any state with an elevation difference over ~16,000 is likely to have a NEO E zscore below 0.
  • Alaska is the only state with an elevation difference over 16,000.

For all of the other states, well, we still can’t tell.

Conclusion

Looking solely at this image, we can’t tell much about NEO E zscore versus elevation difference except that there appears to be a negative correlation which is meaningful for any state above 16,000 feet of difference in elevation. Based on the raw picture, however, your eyes want to believe that there’s a meaningful negative correlation. It’s just not there, though.

Bonus Round: Rabbit Holes I Semi-Successfully Avoided

I won’t get into any of these because they’re tangents and the further I get away from looking at the one picture, the more likely it is that I end up talking about something which the paper authors covered. Let me reiterate that I’m not trashing the underlying paper, as I haven’t read it. But here are a few things I’d want to think about:

  • This data is at the state level and shows elevation difference. When sampling, it seems like you’d want to sample at something closer to the county level in order to get actual elevation. After all, the conjecture is that there is a separating equilibrium between extroverts and introverts based on elevation.
  • Elevation difference is also a bit weird of a proxy to use by state. Not so weird that it’s hinky, but weird enough to make me think about it.
  • Looking at Alaska in particular, they had 710K people as of the 2010 census, but here are the top cities and their elevations:
CityPopulationElevation (feet)
Anchorage291,826102
Fairbanks31,535446
Juneau31,27556
Sitka8,88126
Ketchikan8,0500
Wasilla7,831341
Kenai7,10072
Kodiak6,13049
Bethel6,0803
I gave up after 9. Frankly, that’s 9 more than I expected to do.

This tells us that, at a minimum, ~56% of Alaska residents lived at or near sea level despite being one of the most mountainous states. If introverts want to live in high-elevation areas, it’s a little weird that they’re flocking to the coastline, which is supposed to be a high-extroversion area based on the journal article’s summary. But again, I didn’t read the article (or even look for a non-gated copy), so take that with plenty of grains of salt.

Learning Goals for 2020

Unlike the last couple of years (e.g., 2019), I’m lopping off the “Presentation” portion and focusing more on what I want to learn. Presentations will follow from some of this but there are few guarantees. I’m going to break this up into sections because if I just gave the full list I’d demoralize myself.

The Wide World of Spark

It’s a pretty good time to be working with Apache Spark, and I’m interested in deepening my knowledge considerably this year. Here’s what I’m focusing on in this area:

  • Azure Databricks. I’m pretty familiar with Azure Databricks already and have spent quite a bit of time working with the Community Edition, but I want to spend more time diving into the product and gain expertise.
  • Spark.NET, particularly F#. Getting better with F# is a big part of my 2020 learning goals, and so this fits two goals at once.
  • SparkR and sparklyr have been on my radar for quite a while, but I’ve yet to get comfortable with them. That changes in 2020.
  • Microsoft is putting a lot of money into Big Data Clusters and Azure Synapse Analytics, and I want to be at a point in 2020 where I’m comfortable working with both.
  • Finally, Kafka Streams and Spark Streaming round out my list. Kafka Streams isn’t Spark-related, but I want to be proficient with both of these.

Azure Everywhere

I’m eyeing a few Azure-related certifications for 2020. Aside from the elements above (Databricks, Big Data Clusters, and Synapse Analytics), I’ve got a few things I want to get better at:

  • Azure Data Factory. I skipped Gen1 because it didn’t seem worth it. Gen2 is a different story and it’s about time I got into ADF for real.
  • Azure Functions, especially F#. F# seems like a perfect language for serverless operations.
  • Azure Data Lake Gen2. Same deal with ADF, where Gen1 was blah, but Gen2 looks a lot better. I’ve got the hang of data lakes but really want to dive into theory and practices.

Getting Functional

I released my first F# projects in 2019. These ranged from a couple F#-only projects to combination C#-F# solutions. I learned a huge amount along the way, and 2020 is a good year for me to get even more comfortable with the language.

  • Serverless F#. This relates to Azure Functions up above.
  • Fable and the SAFE Stack. This is a part of programming where I’ve been pretty weak (that’s the downside of specializing in the data platform side of things), so it’d be nice to build up some rudimentary skills here.
  • Become comfortable with .NET Core. I’ve been a .NET Framework kind of guy for a while, and I’m just not quite used to Core. That changes this year.
  • Become comfortable with computational expressions and async in F#. I can use them, but I want to use them without having to think about it first.
  • Finish Domain Modeling Made Functional and Category Theory for Programmers. I’ve been working through these two books and plan to complete them.
  • Get more active in the community. I’ve created one simple pull request for FSharp.Data.SqlClient. I’d really like to add a few more in areas where it makes sense.

Data Science + Pipelines

I have two sets of goals in this category. The first is to become more comfortable with neural networks and boosted decision trees, and get back to where I was in grad school with regression.

The other set of goals is all about data science pipelines. I think you’ll be hearing a lot more about this over the next couple of years, but the gist is using data science-oriented version control systems (like DVC), Docker containers governed by Kubernetes, and pipeline software like MLFlow to build repeatable solutions for data science problems. Basically, data science is going through the maturation phase that software development in general went through over the past decade.

Video Editing

This last set of goals pertains to video editing rather than a data platform or programming topic. I want to make 2020 the year of the Linux desktop video production, and that means sitting down and learning the software side of things. I’m including YouTube tutorials and videos as well as improving my use of OBS Studio for TriPASS’s Twitch channel. If I get good enough at it, I might do a bit more with Twitch, but we’ll see.

Conclusion

Looking back at the list, it’s a pretty ambitious goal. Still, these are the areas where I think it’s worth me spending those crucial off-work hours, and I’d expect to see three or four new presentations come out of all of this.

New Website Design

I’ve been working on this for a little while now, but I finally decided to flip over the Catallaxy Services website to its new design today.

The old design was something I came up with about seven years ago and it definitely showed its age. In particular, it had a crummy mobile experience, with different font sizes, needing to pinch-scroll to see anything, and other dodgy bits.

I like the new website a lot better. Granted, if I didn’t like it as much, I wouldn’t have made the switch, but there are a few things I want to point out.

Resources

I do a lot of stuff, and it’s nice to be able to sort out most of those things in one location. That’s the Resources section.

Some of the things I do.

I also broke out resources by section, so if you want an overview of my community resources, you can click the button to show just those resources. Or if you want to give me money in exchange for goods and/or services, I have a “Paid Resources” section too.

Mobile-First Experience

I wanted to make sure not only that the site would work on a phone, but that it looked good on a phone. When I present, this will often be the first way people check out my site, so I want it to be a good experience.

Images

I’m still working on this, but one of the things about the old site which made it fast but boring is that it was entirely textual. There were no images. Now, there are images on pretty much every page. I want to see what I can do to improve things a bit more, but it’s a start.

Presentations Page

Like the Resources section, you can see a list of presentations and also filter by topic.

These machines, they never learn.

For each presentation, I have the same contents that I had before: abstract, slides, demo code, additional media, and links/resources. I’ve shuffled things around a bit so that it looks reasonably good on a wide-screen monitor but also works on a phone or tablet. Here’s a presentation where everything’s nice and filled out. For other talks, where not everything is filled out, I at least have sensible defaults.

Services

Something I didn’t do a great job of calling out before was availability for consulting and paid services. With the new site, I emphasize this in a few ways, whether that’s consulting, paid training (though actual paid training is forthcoming), or books + third-party offerings like PolyBase Revealed.

What’s Next

I have a few more things around the site, namely around custom training, which I intend to work on during the winter. I also have a couple of smaller ideas which I plan to implement over the next several days, but won’t call them out specifically. For now, though, I think this is a pretty good start.

Category Theory: the Video Lectures

I just wrapped up watching Bartosz Milewski’s YouTube series on category theory. If you are at all interested in functional programming, this is an outstanding set of lectures.

The course covers category theory, a rather abstract branch of mathematics. Throughout the twenty videos, Milewski takes us through the landscape of category theory, grounding it as much as possible in the language and concepts of programming. Some of Milewski’s examples are in Haskell but you don’t need to know that language to understand what’s going on. Similarly, other examples are in C++, but it’s clear from the context what he means even if you’ve never seen a line of the language.

I came into this course with a fair knowledge of functional programming and a less-than-fair knowledge of category theory. Watching these videos has given me a much better understanding of the topic and has really cleared up some of the trickier concepts in functional programming like monads.

For further reading, Milewski has a blog full of posts on the topic and a group has (with permission) turned his posts into a book. There are also two more courses on category theory that Milewski has put together, helping us dig even further into the topic.

If you are interested in the series, don’t get too distracted by the intentionally-opaque examples. I’ve seen this in undergraduate courses I took on logic, where the professor wants to ensure that you don’t get stuck thinking about a specific example when explaining a general concept, as though the specific example were the only case. Milewski does a good job of combining the highly-general drawings of categories and how elements in categories can map to other categories via functors but then brings it down to examples we’re more familiar using, particularly with sets. There’s a balancing act involved in these examples and I think Milewski has that act pretty well covered.

It may take you a month or three to get through all of these videos, but I definitely recommend them. From here, I’m going to work through the book and fill in more gaps.

Today’s Events: PASS DBA Virtual Group

Key Details

What: PASS Database Administration Virtual Chapter
Where: From my house, but you can see me online instead of having to visit my house.
When: Today! August 28, 2019.
Admission is free. Register on the PASS website.

What I’m Presenting

12:00 PM — 1:00 PM Eastern — Approaching Zero

This isn’t a pure DBA talk—in fact, it’s more useful for developers than DBAs. But if you’re in a database development or hybrid DBA role, and you’re sick of needing to be awake for 3 AM database deployments, this talk is for you.

Machine Learning with .NET: Model Serialization

This is part five in a series on Machine Learning with .NET.

So far in this series, we’ve looked at training a model at runtime. Even in the prior post, where we looked at predictions, we first had to train a model. With a Naive Bayes model against a small data set, that’s not an onerous task—it’s maybe a second or two the first time we instantiate our objects and we can leave the prediction engine around for the lifetime of our application, so we are able to amortize the cost fairly effectively.

Suppose, however, that we have a mighty oak of a neural network which we have trained over the past two months. It finally completed training and we now have a set of weights. Obviously, we don’t want to retrain this if we can avoid it, so we need to find a way to serialize our model and save it to disk somewhere so that we can re-use it later.

Saving a Model to Disk

Saving a model to disk is quite easy. Here’s the method I created to do just that:

public void SaveModel(MLContext mlContext, ITransformer model, string modelPath)
{
	using (var stream = File.Create(modelPath))
	{
		mlContext.Model.Save(model, null, stream);
	}
}

We create a file stream and use the built-in Save() method to write our data to that stream. Here is an example of me calling that method:

string modelPath = "C:\\Temp\\BillsModel.mdl";
bmt.SaveModel(mlContext, model, modelPath);

This generates a binary .mdl file which contains enough information to reconstitute our model and generate predictions off of it.

Loading a Model from Disk

Here’s where things get annoying again. If you have any kind of custom mapping, you need to do a bit of extra work, as I mentioned in my rant.

Next, when it comes time to deserialize the model, we need to register assemblies. I have two custom mappings but they’re both in the same assembly. Therefore, I only need to register the assembly using one of them. Should I desire to migrate these mappings into separate assemblies later, I would need to update the code to include each assembly at least once. I’m not sure yet which is a better practice: include all custom assemblies regardless of whether you need them, or go back and modify calling code later. What I do know is that it’s a bit annoying when all I really wanted was a simple string or integer translation. Anyhow, here’s the code:

mlContext.ComponentCatalog.RegisterAssembly(typeof(QBCustomMappings).Assembly);
mlContext.ComponentCatalog.RegisterAssembly(typeof(PointsCustomMappings).Assembly);

Once I’ve registered those assemblies, I can reconstitute the model using a method call:

var newModel = bmt.LoadModel(mlContext, modelPath);

That method call wraps the following code:

public ITransformer LoadModel(MLContext mlContext, string modelPath)
{
	ITransformer loadedModel;
	using (var stream = File.OpenRead(modelPath))
	{
		DataViewSchema dvs;
		loadedModel = mlContext.Model.Load(stream, out dvs);
	}

	return loadedModel;
}

What we’re doing is building a model (of interface type ITransformer) as well as a DataViewSchema which I don’t need here. I get back my completed model and can use it like I just finished training. In fact, I have a test case in my GitHub repo which compares a freshly-trained model versus a saved and reloaded model to ensure function calls work and the outputs are the same:

private string GenerateOutcome(PredictionEngineBase<RawInput, Prediction> pe)
{
	return pe.Predict(new RawInput
	{
		Game = 0,
		Quarterback = "Josh Allen",
		Location = "Home",
		NumberOfPointsScored = 17,
		TopReceiver = "Robert Foster",
		TopRunner = "Josh Allen",
		NumberOfSacks = 0,
		NumberOfDefensiveTurnovers = 0,
		MinutesPossession = 0,
		Outcome = "WHO KNOWS?"
	}).Outcome;
}

[Test()]
public void SaveAndLoadModel()
{
	string modelPath = "C:\\Temp\\BillsModel.mdl";
	bmt.SaveModel(mlContext, model, modelPath);

	// Register the assembly that contains 'QBCustomMappings' with the ComponentCatalog
	// so it can be found when loading the model.
	mlContext.ComponentCatalog.RegisterAssembly(typeof(QBCustomMappings).Assembly);
	mlContext.ComponentCatalog.RegisterAssembly(typeof(PointsCustomMappings).Assembly);

	var newModel = bmt.LoadModel(mlContext, modelPath);

	var newPredictor = mlContext.Model.CreatePredictionEngine<RawInput, Prediction>(newModel);
	var po = GenerateOutcome(predictor);
	var npo = GenerateOutcome(newPredictor);

	Assert.IsNotNull(newModel);
	Assert.AreEqual(po, npo);
}

Results, naturally, align.

Conclusion

In today’s post, we looked at the ability to serialize and deserialize models. We looked at saving the model to disk, although we could save to some other location, as the Save() method on MLContext requires merely a Stream and not a FileStream. Saving and loading models is straightforward as long as you’ve done all of the pre-requisite work around custom mappings (or avoided them altogether).

Machine Learning with .NET: Predictions

This is part four in a series on Machine Learning with .NET.

Thanks to some travels, part 4 in the series is a bit later than I originally anticipated. In the prior two posts, we looked at different ways to create, train, and test models. In this post, we’ll use a trained model and generate predictions from it. We’ll go back to the model from part two.

Predictions in a Jar

Our first step will be to create a test project. This way, we can try out our code without the commitment of a real project. Inside that test project, I’m going to include NUnit bits and prep a setup method. I think I also decided to figure out how many functional programming patterns I could violate in a single file, but that was an ancillary goal. Here is the code file; we’ll work through it block by block.

Setting Things Up

First, we have our setup block.

public class Tests
{
	MLContext mlContext;
	BillsModelTrainer bmt;
	IEstimator<ITransformer> trainer;
	ITransformer model;
	PredictionEngineBase<RawInput, Prediction> predictor;
		

	[SetUp]
	public void Setup()
	{
		mlContext = new MLContext(seed: 9997);
		bmt = new BillsModelTrainer();

		var data = bmt.GetRawData(mlContext, "Resources\\2018Bills.csv");
		var split = mlContext.Data.TrainTestSplit(data, testFraction: 0.25);

		trainer = mlContext.MulticlassClassification.Trainers.NaiveBayes(labelColumnName: "Label", featureColumnName: "Features");
		model = bmt.TrainModel(mlContext, split.TrainSet, trainer);

		predictor = mlContext.Model.CreatePredictionEngine<RawInput, Prediction>(model);
	}

In here, I have mutable objects for context, the Bills model trainer class, the output trainer, the ouptut model, and a prediction engine which takes data of type RawInput and generates predictions of type Prediction. This means that any data we feed into the model must have the same shape as our raw data.

The Setup() method will run before each batch of tests. In it, we build a new ML context with a pre-generated seed of 9997. That way, every time I run this I’ll get the same outcomes. We also new up our Bills model trainer and grab raw data from the Resources folder. I’m going to use the same 2018 game data that we looked at in part 2 of the series, as that’s all of the data I have. We’ll pull out 25% of the games for testing and retain the other 75% for training. For the purposes of this demo, I’m using the Naive Bayes classifier for my trainer, and so TrainModel() takes but a short time, returning me a model.

To use this model for prediction generation, I call mlContext.Model.CreatePredictionEngine with the model as my parameter and explain that I’ll give the model an object of type RawInput and expect back an object of type Prediction. The Prediction class is trivial:

public class Prediction
{
	[ColumnName("PredictedOutcome")]
	public string Outcome { get; set; }
}

I would have preferred the option to get back a simple string but it has to be a custom class.

Knocking Things Down

Generating predictions is easy once you have the data you need. Here’s the test function, along with five separate test cases:

[TestCase(new object[] { "Josh Allen", "Home", 17, "Robert Foster", "LeSean McCoy", "Win" })]
[TestCase(new object[] { "Josh Allen", "Away", 17, "Robert Foster", "LeSean McCoy", "Win" })]
[TestCase(new object[] { "Josh Allen", "Home", 17, "Kelvin Benjamin", "LeSean McCoy", "Win" })]
[TestCase(new object[] { "Nathan Peterman", "Home", 17, "Kelvin Benjamin", "LeSean McCoy", "Loss" })]
[TestCase(new object[] { "Josh Allen", "Away", 7, "Charles Clay", "LeSean McCoy", "Loss" })]
public void TestModel(string quarterback, string location, float numberOfPointsScored,
	string topReceiver, string topRunner, string expectedOutcome)
{
	var outcome = predictor.Predict(new RawInput
	{
		Game = 0,
		Quarterback = quarterback,
		Location = location,
		NumberOfPointsScored = numberOfPointsScored,
		TopReceiver = topReceiver,
		TopRunner = topRunner,
		NumberOfSacks = 0,
		NumberOfDefensiveTurnovers = 0,
		MinutesPossession = 0,
		Outcome = "WHO KNOWS?"
	});

	Assert.AreEqual(expectedOutcome, outcome.Outcome);
}

What I’m doing here is taking in a set of parameters that I’d expect my users could input: quarterback name, home/away game, number of points scored, leading receiver, and leading scorer. I also need to pass in my expected test result—that’s not something I’d expect a user to give me, but is important for test cases to make sure that my model is still scoring things appropriately.

The Predict() method on my predictor takes in a RawInput object, so I can generate that from my input data. Recall that we ignore some of those raw inputs, including game, number of sacks, number of defensive turnovers, and minutes of possession. Instead of bothering my users with that data, I just pass in defaults because I know we don’t use them. In a production-like scenario, I’m more and more convinced that I’d probably do all of this cleanup before model training, especially when dealing with wide data sets with dozens or maybe hundreds of inputs.

Note as well that I need to pass in the outcome. That outcome doesn’t have to be our best guess or anything reasonable—it just needs to have the same data type as our original input data set.

What we get back is a series of hopefully-positive test results:

Even in this model, Kelvin Benjamin drops the ball.

Predictions are that easy: a simple method call and you get back an outcome of type Prediction which has a single string property called Outcome.

Trying Out Different Algorithms

We can also use this test project to try out different algorithms and see what performs better given our data. To do that, I’ll build an evaluation tester. The entire method looks like this:

[TestCase(new object[] { "Naive Bayes" })]
[TestCase(new object[] { "L-BFGS" })]
[TestCase(new object[] { "SDCA Non-Calibrated" })]
public void BasicEvaluationTest(string trainerToUse)
{
	mlContext = new MLContext(seed: 9997);
	bmt = new BillsModelTrainer();

	var data = bmt.GetRawData(mlContext, "Resources\\2018Bills.csv");
	var split = mlContext.Data.TrainTestSplit(data, testFraction: 0.4);

	// If we wish to review the split data, we can run these.
	var trainSet = mlContext.Data.CreateEnumerable<RawInput>(split.TrainSet, reuseRowObject: false);
	var testSet = mlContext.Data.CreateEnumerable<RawInput>(split.TestSet, reuseRowObject: false);

	IEstimator<ITransformer> newTrainer;
	switch (trainerToUse)
	{
		case "Naive Bayes":
			newTrainer = mlContext.MulticlassClassification.Trainers.NaiveBayes(labelColumnName: "Label", featureColumnName: "Features");
			break;
		case "L-BFGS":
			newTrainer = mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy(labelColumnName: "Label", featureColumnName: "Features");
			break;
		case "SDCA Non-Calibrated":
			newTrainer = mlContext.MulticlassClassification.Trainers.SdcaNonCalibrated(labelColumnName: "Label", featureColumnName: "Features");
			break;
		default:
			newTrainer = mlContext.MulticlassClassification.Trainers.NaiveBayes(labelColumnName: "Label", featureColumnName: "Features");
			break;
	}

	var newModel = bmt.TrainModel(mlContext, split.TrainSet, newTrainer);
	var metrics = mlContext.MulticlassClassification.Evaluate(newModel.Transform(split.TestSet));

	Console.WriteLine($"Macro Accuracy = {metrics.MacroAccuracy}; Micro Accuracy = {metrics.MicroAccuracy}");
	Console.WriteLine($"Confusion Matrix with {metrics.ConfusionMatrix.NumberOfClasses} classes.");
	Console.WriteLine($"{metrics.ConfusionMatrix.GetFormattedConfusionTable()}");

	Assert.AreNotEqual(0, metrics.MacroAccuracy);
}

Basically, we’re building up the same set of operations that we did in the Setup() method, but because that setup method has mutable objects, I don’t want to run the risk of overwriting the “real” model with one of these tests, so we do everything locally.

The easiest way I could think of to handle multiple trainers in my test case was to put in a switch statement over my set of valid algorithms. That makes the method brittle with respect to future changes (where every new algorithm I’d like to test involves modifying the test method itself) but it works for our purposes here. After choosing the trainer, we walk through the same modeling process and then spit out a few measures: macro and micro accuracy, as well as a confusion matrix.

To understand macro versus micro accuracy (with help from ML.NET and Sebastian Raschka):

  • Micro accuracy sums up all of your successes and failures, whether the outcome is win, lose, draw, suspension due to blizzard, or cancellation on account of alien abduction. We add up all of those cases and if we’re right 95 out of 100 times, our micro accuracy is 95%.
  • Macro accuracy calculates the harmonic mean of all of the individual classes. If our accuracies for different outcomes are: [ 85% (win), 80% (loss), 50% (draw), 99% (suspension due to blizzard), 33% (cancellation on account of alien abduction) ], we sum them up and divide by the number of outcomes: (0.85 + 0.8 + 0.5 + 0.99 + 0.33) / 5 = 69.4%.

Both of these measures are useful but macro accuracy has a tendency to get pulled down when a small category is wrong. Let’s say that there are 3 alien abduction scenarios out of 100,000 games and we got 1 of those 3 correct. Unless we were trying to predict for alien abductions, the outcome is uncommon enough that it’s relatively unimportant that we get it right. Our macro accuracy is 69.4% but if we drop that class, we’re up to 78.5%. For this reason, micro accuracy is typically better for heavily imbalanced classes.

Finally, we print out a confusion matrix, which shows us what each model did in terms of predicted category versus actual outcome.

Conclusion

Generating predictions with ML.NET is pretty easy, definitely one of the smoothest experiences when working with this product. Although we looked at a test in the blog post, applying this to real code is simple as well as you can see in my MVC project demo code (particularly the controller). In this case, I made the conscious decision to train anew at application start because training is fast. In a real project, we’d want to save a model to disk and load it, which is what we’ll cover next time.