First Thoughts on Automating Jupyter Notebooks

Every time I give a talk covering notebooks, one of the questions I get from people is, “How do I automate the execution of notebooks?” In this post, I’ll look at one way to do just that.

Command Line Execution

I’m going to stick to the command line version of Jupyter. There is also a Python API we can use, but I’ll start with the easy answer.

Let’s say that we have a notebook. This notebook provides us with a sanity check on duplicates. Here’s an ugly HTML version of the notebook:

Notebook Test

This is a test notebook intended for testing notebook automation.

In [ ]:
if(!require(tidyverse)) {
    install.packages("tidyverse", repos = "http://cran.us.r-project.org")
    library(tidyverse)
}

if(!require(odbc)) {
    install.packages("odbc", repos = "http://cran.us.r-project.org")
    library(odbc)
}
In [ ]:
conn <- DBI::dbConnect(odbc::odbc(), 
                      Driver = "SQL Server", 
                      Server = "localhost", 
                      Database = "ForensicAccounting", 
                      Trusted_Connection = "True")
In [ ]:
dupes <- DBI::dbGetQuery(conn, "WITH records AS
(
	SELECT
		li.LineItemDate,
		li.BusID,
		li.VendorID,
		COUNT(*) AS NumberOfInvoices
	FROM dbo.LineItem li
	GROUP BY
		li.LineItemDate,
		li.BusID,
		li.VendorID
)
SELECT
	NumberOfInvoices,
	COUNT(*) AS NumberOfOccurrences
FROM records
GROUP BY
	NumberOfInvoices
ORDER BY
	NumberOfInvoices;")
In [ ]:
nrow(dupes)
In [ ]:
dupes %>% filter(NumberOfInvoices > 1)

Test Case

Here we will throw up a message if we have too many duplicates.

In [ ]:
if (sum(dupes %>% filter(NumberOfInvoices > 1)) > 100) {
    print("PROBLEM ALERT")
}

Next, I want to use the nbconvert tool which is part of Jupyter to execute the notebook and print out the results to an HTML file. I can do that with this one-liner:

jupyter nbconvert --to html --execute "Notebook Test.ipynb" --output notebooktest_20190417.html

That gives me an output file by default in Bootstrap-formatted HTML and here’s the last portion:

The Problem Alert is a serious thing. Seriously.

From There…

In the command above, I included the date of execution. That way, I can script this to run once a day, storing results in an HTML file in some directory. Then, I can compare results over time and see when issues popped up.

I can also parse the resultant HTML if need be. Note that this won’t be trivial: even though the output looks like a simple [1] "PROBLEM ALERT", there’s a more complicated HTML blob. But if you’re going to go down that route, maybe look at the asciidoc format instead:

jupyter nbconvert --to asciidoc --execute "Notebook Test.ipynb" --output notebooktest_20190417.asciidoc

Now the results will be in something more parsable, so you could use this output as part of a Powershell (or other shell) script to automate sending messages out in the event of failure.

Conclusion

This is a short post intended to give you an idea of running Jupyter notebooks from the command line as part of an automation process, particularly with testing in mind.

Advertisements