In yesterday’s post, I covered the resources I used for chapter 5 and teased an approach to writing testable code. Today, I want to give you a sneak peek at Finding Ghosts in Your Data and show you an excerpt from chapter 5 of the book. Note that this is text from the first draft, so if you are visiting from the future and comparing your published copy of the work with what I have here, it might be a little different.
The Functional Approach: An Excerpt
The functional approach emphasizes small, deterministic functions. By deterministic, I mean functions which, given a particular set of inputs, always return the same outputs. As an example, translating temperatures from Celsius to Fahrenheit (or vice versa) is a deterministic operation. If it is 9 degrees Celsius, it is 48.2 degrees Fahrenheit, and no matter how many times we call the conversion function, it should always return the same value. As a counter-example, getting the logged-in user’s remaining loan amount will be non-deterministic in nature. The loan amount itself is not strictly dependent on the inputs: if I send in the logged-in user ID 27, I am not guaranteed to get a value back of $14,906.25. I might get that value this time, but as soon as the user makes a loan payment and that number drops, the relationship no longer holds. Therefore, the next time I run the test, the result may or may not match my test’s expectations, leading to spurious test failures. By making functions deterministic, our tests are less likely to break and we will spend less time fixing failing tests.
Another important aspect of functional programming relevant to writing testable Python code is that functions should not have side effects. In other words, functions take inputs and convert them to outputs; they don’t do anything else. This approach is aspirational rather than entirely realistic—after all, saving to the database is a side effect, and most applications would be fairly boring if they offered absolutely no way to modify the data. It just happens to be the case that our outlier detection engine can be close to side effect-free because we do not create files, save to a database, or push results to some third-party service. With most applications, however, we do not tend to be so lucky.
In practice, “no side effects” really means “have as many functions as possible be side effect-free.” The biggest benefit to this is that it is easier to reason over your code: I can see a function which calculates monthly interest payments given a principal, an interest rate, and number of payments. If that is all the code does, a single method call like monthly_payment = get_monthly_payment(principal, irate, num_payments) is all we need to see. If the get_monthly_payment() function also assigns a loan representative to the customer, sends an e-mail to the customer, and fires off a job to update interest rates in the database, we need to carefully read through the contents of the function to understand what is happening, and that makes testing considerably more difficult. By making as many functions as possible side effect-free, we simplify the process of test design and creation.
Finding Ghosts in Your Data, Apress, 2022 (forthcoming as of time of writing).
Summarizing the Points
Testable code in Python is not that different from testable code in other general-purpose programming languages, especially because Python does have some object-oriented and some functional leanings. My recommendation is to lean in heavily on one of those two models, and my biases put me on the functional side. This means:
- Functions should be small. Take an input, do what you need to do, return an output. The benefit here is that it’s easy to test that your function does exactly what it says it does.
- Functions should be composable. Because functions are small, we probably need to combine them together in some fashion to get where we need to go. The benefit here is that you can test at multiple levels: the individual composed functions, to ensure that they are operating as expected; and the composite functions, to ensure that there are no issues in composition.
- Make functions deterministic whenever possible: one set of inputs gives you one and only one set of outputs. That way, you can cover all of the relevant use cases and your tests don’t break sometime down the road because the test customer you created in the database no longer has the same number of orders as it did when you created the test.
- Functions should avoid side effects whenever possible. Functions without side effects are a lot easier to test than functions with side effects. If you have side effects, you’ll need to test to make sure that the side effects do what you expect them to do.
- Functions should have appropriate names and should do what the name says. If you are struggling to name a function, that might be an indication that it’s trying to do too much.
None of this is mindbending stuff, but they are good principles to follow, especially when you’ve put together that 200-line function named do_stuff()
which takes 15 parameters. I certainly won’t claim that my code is amazing or that it is the gold standard for great functional design. That said, thinking about these principles has bailed me out a few times when I got a little too creative in writing code and ended up with some 150-200 line functions.