This is a review of C.J. Date’s Database Design and Relational Theory: Normal Forms and All That Jazz, Second Edition (Apress, 2019).
I like this book a lot. So much so that it forms the foundation of my talk entitled Perfectly Normal (which I’ll turn into a blog series in the new year). It’s important to set your expectations with the book: this is an academic work. It’s not beach reading…notwithstanding the fact that I read all of this book while at the beach, but most people aren’t like me. Thankfully.
Despite my saying that it’s an academic work, Date does his best to make it approachable to a non-academic audience, and there are good stretches in which he succeeds. There’s a fair amount of humor sprinkled in throughout the book, and he does try to make concrete as many examples as possible. That said, when your goal is to make accessible to technically-minded non-academicians the ins and outs of database normalization theory, there are going to be some rocky shoals along the way. There are a few esoteric normal forms which are the database equivalent of figuring out how many angels can dance on the head of a pin, most of which are in chapter 15 (Domain-Key Normal Form, Elementary Key Normal Form, Overstrong PJ/NF).
If I were to sum up a few key takeaways from the book, it would be:
- Boyce-Codd Normal Form and 5th Normal Form are the two most important normal forms, as they both solve a large number of important data problems.
- 6th Normal Form can sometimes be important, although in many cases it’s overkill.
- 1st Normal Form and 4th Normal Form are both overrated.
- The rest of the normal forms are occasionally relevant, except for 2nd and 3rd Normal Forms, which are never relevant because Boyce-Codd Normal Form covers them and more, and there’s never a case in which you should aim for 2nd+3rd but not BCNF.
- Terminology matters. Relation values, relvars (relation variables), tuples, predicates, propositions, and the like are not common parlance for database developers, but they do help clarify things, especially relvar and relation value versus table/relation. Separating relation values (or just “relations”) and relation variables apart conceptually gives you a time aspect: a relation variable Employee will have so many tuples with specific values for specific attributes at one point in time, but those tuples will change over time as we modify data. When thinking about database design theory, time matters a lot.
I definitely recommend picking up a copy of this book and going through it if you’re serious about database architecture. Date stays explicitly academic in the book, going so far as to say that performance doesn’t matter here. It certainly does when implementing database designs, but in the world of theory, I think that approach is fine, and there’s value in practitioners synthesizing these insights with the specific nature of database platforms to do the best we can with what we have available.