The Framework Laptop

About a month ago, I pre-ordered the Framework Laptop, specifically the DIY edition. This laptop isn’t exactly what I want in a laptop, but it does have a lot going for it, so I figured I would write up a summary of why I went with this one, especially as it’s expected to arrive tomorrow.

The Bottom Line: Right to Repair

Right to repair isn’t a topic I’ve discussed much, well, ever, but it’s an important issue for reasons I covered in the most recent episode of Shop Talk. I also plan to have a lengthier write-up sometime soon which covers my thoughts on the topic, but the really short version is that I want to maximize options for choice when it comes to property I own, and right to repair extends the sphere of possible choices.

The Framework team has put a lot of effort into making their laptop repairable and upgradable. They have a series of support guides which walk you, step by step, through the process of component replacement. For example, if you need to replace your mainboard, there’s a guide for that.

They’re also making schematics available to third-party repair shops, a rarity in the computing world. The sad part is that, several decades ago, schematics tended to be included in the giant product manuals for hardware; now, we consider it laudable that a company doesn’t consider this top-secret information.

The Top Line: Good Specs at an OK Price

When making my decision, I knew that I could find a similarly-spec’d laptop at a lower price, but that bottom line is worth a fair amount to me. In case you’re curious about the specs I chose, here goes:

  • Intel i7-1165G7. They offer an option with the i7-1185G7 but I don’t think the tiny performance difference is worth the big price difference.
  • 2 TB Western Digital SN750 NVMe
  • 64 GB DDR-3200 RAM. To date, my laptops have all been 16 GB of RAM, which works fine on its own, but once I feel the urge to spin up a Kubernetes pod or a few Docker containers, that RAM disappears fast.
  • HDMI expansion card, 2 USB-C expansion cards, 2 USB-A expansion cards. One of the most clever choices the Framework team made was the modular design for their expansion cards. The laptop has four expansion card bays and because the expansion cards are really just USB-C ports, they’re hot-swappable. There’s also the possibility in the future of additional expansion card types fitting this common form factor, extending the lifespan of this laptop.

The Middle Line: It’s Not Perfect

When purchasing any laptop, you’re going to make a series of trade-offs. You can get desktop computer power and a full-sized keyboard with number pad if you’re willing to haul around a 17″ monstrosity which weighs more than a newborn and acts like a space heater. If you want something extremely light (like I prefer), you’re typically going to settle for a mediocre keyboard, 12-13″ of screen space, and a limited amount of RAM.

All of this is to say that the Framework Laptop is a compromise option, especially considering that this is a new startup hardware vendor, so they’re only going to have a couple options available. Yeah, it’d be nice to have an AMD chipset, a touchscreen, a monitor which supports higher resolutions, and a keyboard with the four keys (Home, End, Page Up, Page Down) as separate keys. But going back to the bottom line: unlike with other laptops, there’s actually a chance that I can resolve each of these over time. The Framework Marketplace has launched and although it only has the DIY options today, there’s the possibility of new monitor varieties, swappable keyboards, and more in the future. Sure, some of this may be a “2 years later” scenario but that’s still considerably better than I could ever hope for with any other laptop.

The Penultimate Line: Linux on the Laptop

One other factor I haven’t mentioned is that I’ve been itching to put Linux on my primary laptop for a while. I’ve avoided this mostly because of its atrocious support for touchscreens and my love of the same. But without a touchscreen option, I decided to take another dive. In case you’re curious, I’m planning on using elementary OS, a distro built off of Ubuntu that I’ve used in the past and enjoyed. Yeah, I could use Ubuntu 21.10 (scheduled to release the same day I get my laptop, so what could possibly go wrong?), but one of the co-founders of elementary OS is a Framework Laptop user and has a great post covering setup and I enjoyed elementary the last time I had it running on a laptop.

The Conclusion

No computer is going to be perfect, and laptops are particularly trade-off heavy. That said, I was happy enough with the options available with the Framework Laptop—and impressed enough with their stance in favor of right to repair—that I decided to take the plunge. I’d like to see more companies move toward making schematics available and making repair options easy, so if all other things are close enough to equal, I’ll go with the repair-friendly company over the repair-resistant company.

What’s On My Work PC?

I just had to rebuild my work machine, so I figured I’d get a cheap blog post out of it discuss the tools that I use regularly. I’ve broken these down by category. As a quick note, this is on my main work machine, so there are some things which I use on other PCs but don’t here. Part of that is the nature of the job. This also isn’t everything on my machine, but does cover most of the day-to-day tools I use.

Got something you really like but I don’t have? Let me know in the comments.

Connecting To Databases

  • SQL Server Management Studio 17. I don’t have any SQL Server 2019 instances yet, so no need to move to SSMS 18 just yet. Plugins I rely on include:
    • devart SQL Complete. Their snippet management, intellisense improvements, and (most importantly) document formatting work great for me. This plugin also works with Visual Studio.
    • Tabs Studio. I tend to have dozens of SSMS tabs open at once. Tabs Studio lets me use the vertical space on my monitor instead of having 4 visible tabs and a drop-down with a couple dozen others. This plugin also works with Visual Studio.
    • SentryOne Plan Explorer. The best way to view execution plans.
  • Azure Data Studio. I don’t use this quite as much as SSMS, but I’m moving in that direction. A couple of plugins and I’d be there. Key plugins:
    • SQL Server 2019 support. This adds some nice functionality including notebooks in Azure Data Studio. That’s a feature differentiator between ADS and SSMS.
    • SSMS keymap. I have too much muscle memory tied up in Ctrl-R and the like to give it up.
    • High Color Queries. I like the color sets. Because I am a monster, I prefer the light theme.
  • Aginity Redshift Workbench. I don’t use Redshift often, but when I do, this is my go-to app.
  • pgAdmin 4. Same as Aginity: I don’t get into Postgres very often, but when I need to go there, this is how I do it.
  • Power BI Desktop and Power BI Report Server. We use Power BI Report Server internally rather than deploying to Azure. I also use Power BI Desktop for my own personal dashboards that don’t make it to the outside world.

Writing Code

  • Visual Studio 2017. Pretty much all of my .NET development happens in Visual Studio. Visual Studio 2017 also installs Anaconda and Jupyter if you install the Data Science tools, so I include them here rather than as separate line items. Key plugin:
    • BimlExpress. If you’re doing SQL Server Integration Services development, you really need to know Biml. There’s a lot more you can do with Biml as well.
  • Visual Studio Code. I haven’t gotten quite into VS Code but I’m starting to use it for Python code dev and other non-.NET languages.
  • R Studio. It’s still the standard for R development.

Checking In Code

  • TortoiseGit. This is a tool that I have mostly because a lot of other developers at the company have it, so sometimes it’s easier just to have it installed when working through issues.
  • SourceTree. This is my primary Git client.

Wrangling Text

  • OneNote. I really don’t like the new version and highly prefer the Office 2016 version. The only thing I like about the Windows app version is its superior touchscreen support, but on my non-touchscreen devices it’s a pain. It’s also not close to feature-complete. So it’s the Office 2016 version for me.
  • Notepad++. I’m using it a bit less and VS Code a bit more, but it’s the first thing I look for when I right-click a text file.
  • Liquid Studio. This is a tool that I used to have. I used it specifically for its large file support, being able to read multi-gigabyte text files without choking. Honestly, I just want good versions of head, tail, and the like on Windows. And I’m still not really sold on Cygwin though I might install it yet again.
  • KDiff3. Yeah, the last update was 4 1/2 years ago, but it’s still my go-to diff and merge tool.

Dealing With Files

  • MultiCommander. The dual-pane layout is great for my purposes, as it’s sort of like having two Windows Explorer windows open, except a lot more functional.
  • S3 Browser. This is one of the better Amazon S3 file browsers that I’ve seen, particularly given the price.
  • Azure Storage Explorer. Free and deals with Azure storage. That works for me.
  • WinDirStat. Ever wonder where all of your disk space went? WinDirStat will tell you exactly where it is and give you a treemap to visualize whether that’s one giant file or a bunch of smaller ones.
  • 7-Zip. The 7z format is rather efficient, though I tend not to share those files. I do use it for compressing files local to my machine, and I like its interface for extracting files. I’d like the .tar.gz process to be a one-step process instead of two, though.
  • Sysinternals Utilities. I include this here because I didn’t want to create a “miscellany” section just for it. I use ZoomIt frequently, especially during code reviews. Presentations aren’t always in front of large groups.

GitPitch: Revamping My Slide Decks

Over the past few months, I’ve started transitioning my slide decks to GitPitch.  I’ve used reveal.js for years and enjoyed being able to put together presentations in HTML format.  I can run these slides in any browser (or even phones) and don’t have to worry about Power Point messing up when I go to presentation mode.  I also get the ability to transition up and down as well as left and right, which lets me “drill into” topics.  For example, from my Genetic Algorithms talk:

If I want to go into a topic, I can slide down; if I need to skip it for time, I can bypass the section.

GitPitch also uses Reveal, so many of the skills that I’ve built up still work for me.  But there are a few compelling reasons for me to migrate.

Reasons For Moving To GitPitch

So why did I move to GitPitch?  It started with an offer to Microsoft MVPs where they offer a free year of GitPitch Pro.  For transparency purposes, I am taking advantage of that free year.  After the year is up, I’m going to start paying for the service, but I did get it for free.

So why did I make the jump?  There are a few reasons.

Background Images

One of the best features of GitPitch (and a reason I went with Pro over the Free edition) is the ability to use a background image and set the transparency level.  This lets me add flavor to my slide decks without overwhelming the audience:

There are a number of good sources for getting free images you can use for these purposes, including UnsplashCreative Commons Search, and SkyPixel for drone photos.  I’m also working to introduce my own photography when it makes sense.


Reveal.js uses HTML to build slides, but GitPitch uses Markdown.  Markdown is a fairly easy syntax to pick up and is a critical part of Jupyter Notebooks.

To build the slide above, the markdown looks like this:


### Outlook


No HTML and it’s pretty easy to make changes.  Because it’s a text-based format, major changes are also pretty easy to find-and-replace, something hard to do with Power Point.


Recently, I did a presentation on Naive Bayes classifiers.  To display math, we can use MathJax.  Here’s an example slide:


### Applying Bayes' Theorem

Supposing multiple inputs, we can combine them together like so:

`$P(B|A) = \dfrac{P(x_1|B) * P(x_2|B) * ... * P(x_n|B)}{P(A)}$`

This is because we assume that the inputs are **independent** from one another.

Given `$B_1, B_2, ..., B_N$` as possible classes, we want to find the `$B_i$` with the highest probability.

And here’s how it looks:

I’m definitely not very familiar with MathJax, but there are online editors to help you out.

Code Samples

Another cool feature is the ability to include code samples:

What’s really cool is the ability to walk through step by step:


if(!require(naivebayes)) {
if(!require(caret)) {


irisr <- iris[sample(nrow(iris)),]
irisr <- irisr[sample(nrow(irisr)),]

iris.train <- irisr[1:120,]
iris.test <- irisr[121:150,]

nb <- naivebayes::naive_bayes(Species ~ ., data = iris.train)
#plot(nb, ask=TRUE)

iris.output <- cbind(iris.test, 
prediction = predict(nb, iris.test))
table(iris.output$Species, iris.output$prediction)
confusionMatrix(iris.output$prediction, iris.output$Species)

@[1-4](Install the naivebayes package to generate a Naive Bayes model.)
@[5-8](Install the caret package to generate a confusion matrix.)
@[12-14](Pseudo-randomize the data set. This is small so we can do it by hand.)
@[16-17](Generate training and test data sets.)
@[19](Building a Naive Bayes model is as simple as a single function call.)
@[22-25](Generate predictions and analyze the resulting predictions for accuracy.)

I haven’t used this as much as I want to, as my talks historically have not included much code—I save that for the demos.  But with the ability to walk through code a section at a time, it makes it easier to present code directly.

Up On GitHub

Something I wasn’t a huge fan of but grew to be okay with was that the markdown files and images are stored in your GitHub repo.  For example, my talk on the data science process is just integrated into my GitHub repo for the talk.  This has upsides and downsides.  The biggest upside is that I don’t have to store the slides work anywhere else, but I’ll describe the downside in the next section.

Tricky Portions

There are a few things that I’ve had to work out in the process.

Handling Updates

Testing things out can be a bit annoying because you have to push changes to GitHub first.  Granted, this isn’t a big deal—commit and push, test, commit and push, test, rinse and repeat.  But previously I could save the HTML file, open it locally, and test the results.  That was a smoother process, though as I’ve built up a couple of decks, I have patterns I can follow so that reduces the pain quite a bit.

Online Only (Mostly)

There is an offline slideshow mode in GitPitch Pro and a desktop version, but I haven’t really gotten into those.  It’s easier for me just to work online and push to my repos.

When presenting, I do need to be online to grab the presentation, but that’s something I can queue up in my hotel room the night before if I want—just keep the browser window open and I can move back and forth through the deck.

Down Transitions And Background Images

One thing I’ve had to deal with when migrating slide decks is that although I can still use the same up-down-left-right mechanics in Reveal.js, when using transparency on images, I’ve noticed that the images tend to bleed together when moving downward.  I’ve dealt with the problem by going back to a “classic” left-right only slide transition scheme.

Wrapping Up

I’ve become enough of a fan of GitPitch to decide that I’m going to use that as my default.  I think the decks are more visually compelling:  for example, there’s my original data science process slide deck and my GitPitch version.  As far as content goes, both decks tell the same story, but I think the GitPitch version retains interest a bit better.  As I give these talks in the new year, I’ll continue transitioning to GitPitch, and new talks will go there by default.

TIL: Docker

Last night, I went to a local .NET User Group meetup and got my first taste of Docker.

In my case, I ended up running on Elementary OS rather than Windows, but the experience was a good one, going through the tutorials. In the end, I installed Solr and was able to load a document for indexing.  Installing nginx was also easy.

If you are interested in installing Docker in a Windows environment, start with Mano Marks’s article on the topic. You can also look at the Docker Tools for Visual Studio.

TIL: Kafka Queueing Strategy

I had the privilege to sit down with a couple Hortonworks solution engineers and discuss a potential Hadoop solution in our environment.  During that time, I learned an interesting strategy for handling data in Kafka.

Our environment uses MSMQ for queueing.  What we do is add items to a queue, and then consumers pop items off of the queue and consume them.  The advantage to this is that you can easily see how many items are currently on the queue and multiple consumer threads can interact, playing nice with the queue.  Queue items last for a certain amount of time—in our case, we typically expire them after one hour.

With Kafka, however, queues are handled a bit differently.  The queue manager does not know or care about which consumers read what data when (making it impossible for a consumer to tell how many items are left on the queue at a certain point in time), and the consumers have no ability to pop items off of the queue.  Instead, queue items fall off after they expire.

Our particular scenario has some web servers which need to handle incoming clicks.  Ideally, we want to handle that click and dump it immediately onto a queue, freeing up the web server thread to handle the next request.  Once data gets into the queue, we want it to live until our internal systems have a chance to process that data—if we have a catastrophe and items fall off of this queue, we lose revenue.

The strategy in this case is to take advantage of multiple queues and multiple stages.  I had thought of “a” queue, into which the web server puts click data and out of which the next step pulls clicks for processing.  Instead of that, a better strategy (given what we do and our requirements) is to immediately put the data into a queue and then have consumers pull from the queue, perform some internal “enrichment” processes, and finally put the enriched data back onto a new queue.  That new queue will collect data and an occasional batch job pulls it off to write to HDFS.  This way, you don’t take the hit of streaming rows into HDFS.  As far as maintaining data goes, we’d need to set our TTL to last long enough that we can deal with an internal processing engine catastrophe but not so long that we run out of disk space holding messages.  That’s a fine line trade-off we’ll need to figure out as we go along.

Summing things up, Kafka is quite a different product than MSMQ, and a workable architecture is going to look different depending upon which queue product you use.

TIL: Bots

Last night, Jamie Dixon (whose book you should buy) talked about his experience at Build this year.  His main takeaway is that Microsoft is pushing their Bot Framework pretty hard.  Jamie showed how to create a stock lookup bot and deploy it to Slack using F# code and an Azure account.

About a month ago, I went to F8 and my main takeaway from that conference is that Facebook is pushing bots on their Messenger platform pretty hard.

Based on my (extremely) limited knowledge of both, it seems that the Facebook bot platform is a bit easier to deal with, as there’s a user interface for writing messages and tokenizing, but right now, Microsoft’s platform is a bit more customizable and developer-friendly.  What’s particularly interesting about both of these is that the analytics and intelligence engines are both closed-source—neither company will let you see the wizard behind the curtain.

TIL: Auditing, Monitoring, Alerting

I’m giving a presentation on monitoring this Monday.  As part of that, I want to firm up some thoughts on the differences between auditing, monitoring, and alerting.  All three of these are vital for an organization, but they serve entirely different functions and have different requirements.  I’ll hit a bunch of bullet points for each.


Auditing is all about understanding a process and what went on.  Ideally, you would audit every business-relevant action, in order, and be able to “replay” that business action.  Let’s say we have a process which grabs a flat file from an FTP server somewhere, dumps data into a staging table, and then performs ETL and puts rows into transactional tables.  Our auditing process should be able to show what happened, when.  We want to log each activity (grab flat file, insert rows into staging table, process ETL) down to its most granular level.  If we make an external API call for each row as part of the ETL process, we should log the exact call.  If we throw away a row, we should note that.  If we modify attributes, we should note that.

Of course, this is a huge amount of data and depending upon processing requirements and available storage space, you probably have to live with something much less thorough.  So here are some thoughts:

  • Keep as much information around errors as you can, including stack traces, full parameter listings, and calling processes.
  • Build in (whenever possible) the full logging mentioned, but leave it as a debug/trace flag in your app.  You could get creative and have custom tracing—maybe turn on debugging just for one customer.  You might also think about automatically switching that debug mode back off after a certain amount of time.
  • Add logical “process run” keys.  If there are three or four systems which process the same data in a pipeline, it makes sense to track those chunks of data separate from the individual pipeline steps.  At an extreme case, you might want to see how an individual row in a table somewhere got there, with a lineage ID that traces back to specific flat files or specific API calls or specific processes and tells you everything that happened to get to that point.  Again, this is probably more of an ideal than a practical scenario, but dream big…
  • Build an app to read your audit data.  Reading text files is okay, but once you get processes interacting with one another, audit files can get really confusing.


Monitoring is all about seeing what’s going on in your system “right now.  You want nice visualizations which give you relevant information about currently-running processes, and I put “right now” in quotation marks because you can be monitoring a process which only updates once every X minutes.

There are a couple of important things to consider with monitoring:

  • Track what’s important.  Don’t track everything “just in case,” but focus on metrics you know are important.  As you investigate problems and find new metrics which can help, add them in, but don’t be afraid to start small.
  • Monitoring should focus on aggregations, streams, and trends.  It’s your 50,000-foot view of your world.  Ideally, your monitoring system will let you drill down to more detail, but at the very least, it should let you see if there’s a problem.
  • Monitors are not directly actionable.  In other words, the purpose of a monitor is to display information so a human can observe, orient, decide, and act.  If you have an automated solution to a problem, you don’t need a monitor; you need an automated process to fix the issue!  You can monitor the automated solution to make sure it’s still running and track how frequently it’s fixing things, of course, but the end consumer of a monitor is a human.
  • Ideally, a monitor will display enough information to weed out cyclical noise.  If you have a process which runs every 60 minutes and which always slams your SAN the top 5 minutes of each hour, maybe graph the last 2 or 3 hours so you can see the cycles.  If you have enough data, you can also build baselines of “normal” behavior and plot those against current behavior to make it easier for people to see if there is a potential issue.
  • Monitors are a “pull” technology.  You, as a consumer, go to the monitor application and look at what’s going on.  The monitor does not jump out and send you messages and popups and try to get your attention.


Alerting is all about sending messages and getting your attention.  This is because an alert is telling you something that you (as a trained operator) need to act upon.  I think alerting is the hardest thing on the list to get right because there are several important considerations here:

  • Alerts need to be actionable.  If I page the guy on call at 3 AM, it’d better be because I need the guy on call to do something.
  • Alerts need to be “complete.”  The alert should provide enough information that a sleep-deprived technician can know exactly what to do.  The alert can provide links to additional documentation, how-to guides, etc.  It can also show the complete error message and even some secondary diagnostic stuff which is (potentially) related.  In other words, the alert definitely needs to be more than an e-mail alert which reads “Error:  object reference not set to an instance of an object.”
  • Alerts need to be accurate.  If you start throwing false positive alerts—alerting when there is no actual underlying problem—people will turn off the alert.  If you have false negatives—not alerting when there is an underlying problem—your technicians are living under a false sense of security.  In the worst case scenario, technicians will turn off (or ignore) the alerts and occasionally remember to check a monitor which lets the know that there was a problem two hours ago.
  • Alerts need human intervention.  If I get an alert saying that something failed and an automated process has kicked in to fix the problem, I don’t need that alert!  If the automated process fails and I need to perform some action, then I should get an alert.  Otherwise, just log the failure, have the automated process run to fix the problem, and let the technicians sleep.  If management needs figures or wants to know what things looked like overnight, create reports and digests of this information and pass it along to them, but don’t bother your technicians.
  • On a related note, alerts need to be for non-automatable issues.  If you can automate a problem away, do so.  Even if it takes a fair amount of time, there’s a lot less risk in a documented, tested, automated process than in waking up some groggy technician.  People at 3 AM make mistakes, even when they have how-to documents and clear processes.  People at all hours of the day make mistakes; we get distracted and miss steps, mis-type something, click the wrong button, follow the wrong process, think we have everything memorized but (whoopsie) forgot a piece.  Computers are less likely to have these problems.

Wrapping Up

Auditing, monitoring, and alerting solve three different sets of problems.  They also have three different sets of requirements for what kind of data to use, how frequently to refresh this data, and how people interact with them.  It’s important to keep these clearly delineated for that reason.

During this, I’m also working on some toy monitoring stuff, so I hope that’ll be tomorrow’s TIL.

TIL: Installing Jupyter And R Support

I recently got through some difficulties installing Jupyter and incorporating R support, so I wanted to write up a quick installation post for a Linux installation.

First, install Jupyter through anaconda.  Notes:

  1. I grabbed the Python 3.5 version.  I don’t intend to write too much Python code here, so it shouldn’t make a huge difference to me.
  2. When you install, do not run sudo.  Just run the bash script.  It will install in your home directory by default, and for a one-off installation on a VM (like my scenario), this is fine.  It also makes future steps easier.

When running Jupyter, I started by following this guide.  Notes:

  1. Starting Jupyter is as easy as running “jupyter notebook” and navigating to http://localhost:8888 in a browser.
  2. Midori does not appear to be a good browser for Jupyter; when I tried to open a new notebook, I got an error message in the console and the browser seemed not to want to open up the notebook.  Firefox worked just fine.  Maybe I’m doing this wrong, though.

As for installing R support, Andrie de Vries has a nice post on the topic.  Notes:

  1. Here’s where not running sudo above pays off.  If you did run sudo, you’ll get an error saying that you can’t install in the home directory and that you should run a command to make a copy of the files…in your home directory.  If you accidentally ran sudo, you can chmod all of the files in your anaconda3/ directory using “chmod -R [user]:[user] anaconda3/” and correct the issue.
  2. Installation is as simple as running “conda install -c r ipython-notebook r-irkernel”  Again, note that I’m not running sudo here.

Migrating A Virtual Machine To Azure

My Demo Machine

I have a reasonably powerful laptop that I use for demos.  I also use VMware Player because it’s free, because I have experience with VMware, and because I want virtual machines for demos.  That way, I can keep my demonstration environment fairly well controlled.  I don’t upgrade my demo machines that often, and when I do, I’m reasonably careful about it.  This also allows me to repeat my demonstrations without too much bother, and it also means that my futzing about with other work doesn’t affect my ability to demo presentations.

My Mistake

Just about two months ago, I had disaster befall me at SQL Saturday Pittsburgh, when my laptop and the provided projector absolutely would not play nice.  I had a tablet with me, but there’s no way my little tablet would be able to power SQL Server, even if I had it installed.  What that tablet can do, though, is run a VM.  I also have an Azure subscription, so I decided that one of my many safety measures would be to migrate my demo VM up to Azure so that I could spin up a VM in the event of future disaster.


There are two approaches that will work:  the Microsoft Virtual Machine Converter and manually uploading VMs.  I’ll walk through each.

Microsoft Virtual Machine Converter

There are a few good resources on how to get MVMC working.  I started with Carsten Lemm’s blog post on the topic because I wanted to migrate a VMware VM into Azure and I could afford to spend 30 minutes on the task.  Sadly, my experience took well over 30 minutes…

After downloading and installing the MVMC executable, I followed Carsten’s instructions and made sure that my VM obtains DNS and IP addresses automatically rather than hard-coding addresses.  I also turned on Remote Desktop.  At that point, Carsten’s blog post is a bit out of date, as he references an executable which no longer exists.  But that’s okay, because the MVMC 3.0 executable now has a nice wizard.  The route is pretty simple: on the first screen, we want to select the “Virtual machine conversion” radio button.  Then, select the “Migrate to Microsoft Azure” option and hit Next again.  The next tab asks you for a Subscription ID and Certificate Thumbprint.

To get the Subscription ID, you can go to the old Azure portal and click the Settings tab on the left-hand navigation bar.  That will give you a GUID which represents your subscription ID.  Copy and paste that into the Subscription ID section and you will find a bug:  this page has an off-by-one error.  If you paste in your subscription ID, you’ll see that the last character is missing and the app does not allow you to type in that last character.  Even if you delete characters, you’re still stuck.  The only way I was able to get past this is to type in my GUID manually.

As for the Certificate Thumbprint, the same applies:  I needed to type it in manually or else the app would cut off part of the thumbprint.  Don’t type in any of the spaces and you’ll be fine.  If you don’t know how to create a certificate, check out the Additional Resources section.

From here, I’m going to cut out because the next screen ended up being my downfall:  they want me to put in my VCenter, ESX, or ESXi server name.  I don’t have one of those; I’m just using VMware player and want to convert my VMDK files to VHDs so Azure can use them.  I realized at this point that MVMC was not the trick.  The only reason I’m including this section is to point out the bug above, just in case anybody gets errors like The certificate with thumbprint [thumbprint] was not found in the personal certificate store.  If you know that you’re copy-pasting correctly but it’s still giving you that error, type the thumbprint out and see how that goes.

Manual VM Upload

From here, I decided to cut my losses and start over without MVMC.  The first step is that you need to run sysprep on the VM.  If you don’t do this, the image will fail to provision and you might not be able to use the Azure VM you create.  Sysprep is available in the Windows\System32\Sysprep directory, and has a GUI.  In my case, I decided to copy my VM folder to ensure that it didn’t mess up the local copy of my demo box VM.  The last thing I want is for my Azure copy to mess up my local copy.  Anyhow, run sysprep.exe to begin sysprep.

Once sysprep’s GUI appears, select the “Enter System Out-of-Box Experience (OOBE)” drop-down option and check the “Generalize” checkbox, and then select the “Shutdown” drop-down menu item from the Shutdown Options list.  Let sysprep do its thing, and then it’ll shut down your VM.

Once sysprep was done, I needed to find a way to get the VMDK files converted to VHDs.  A blog post turned me on to StarWind Software’s V2V Converter.  It’s a free tool which allows you to convert virtual hard drive files from one format to another.  Installing this tool let me turn my set of VMDKs into one 45GB VHD.  One note is that, at least on my machine, I needed to run the V2V Converter from a command prompt; executing the app directly from the Start menu would cause the app to appear for a moment and then disappear, as though some error killed the program.  The tool installs by default in “%programfiles(x86)%\StarWind Software\StarWind V2V Image Converter\StarV2V.exe” From there, I just needed to get that big image into Azure.

Make sure that you have the Microsoft Azure Powershell cmdlets.  Then, follow these instructions to connect your Azure subscription to the local machine and upload your VHD using the Add-AzureVhd cmdlet.  Make sure that you have a Storage object and a Blob container, as that’s where you’re going to store the VHD file from which you’ll make a Virtual Machine image.

Once you have that image uploaded, you can create a new Azure VM from an image.  Select the “MY IMAGES” option and you can pick your demo image.  It’ll take a while for the VM to be provisioned.  Also, don’t forget that you’re going to be charged for that VM as long as it’s running, so if you’re not using it, turn that sucker off.


I wanted a nice and easy GUI that let me tell an app where my VMDK files were and let it do all of the preparation, conversion, uploading, image creating, and provisioning.  You aren’t going to get that.  For these types of one-off scenarios, I accept (but am not happy with) the second approach listed.  If I were doing this in an enterprise environment, there’d be a lot more Powershell.

Additional Resources


Manual VM Upload