My first anti-pattern is a pet peeve. In many cases, it is not necessarily terrible, but like any great code smell, seeing ANSI-89 joins tells me that there may be deeper issues involved.
What do I mean by an ANSI-89 join? Prior to ANSI-92’s specification of INNER, OUTER, and CROSS JOIN, SQL developers had a slightly different way of joining tables, one which comes a bit closer in appearance to the language’s relational roots:
SELECT a.Col1, b.Col2 FROM dbo.TableA a, dbo.TableB b WHERE a.JoinColumn = b.JoinColumn;
The above code shows how we would perform an inner join in old-time notation. We start with a comma-separated list of tables in the FROM clause. Then, in the WHERE clause, we define the join criteria. When I say this is more “relational” than the later syntax, I mean that in relational algebra, joins of relations are solely Cartesian joins: the cross product of all elements in relation A with all elements in relation B. From there, we can apply filters to get back only the rows that we want. Any actual implementation of actual relational algebra wants to avoid that “get all the data from all of the relevant tables” step and change it to “get only the data from the relevant tables that we need, throwing away as little as possible.” This is because, in contrast to the world of mathematics, our operations have serious costs. If I need to pull 1 billion rows into memory in order to work on three rows, my solution will never perform well enough.
This notation has two serious problems which the ANSI council, to their credit, realized fairly quickly. The first problem is that the old-style join combines join criteria with explicit filters. This makes it harder for developers to understand queries of any serious complexity.
This leads to the second problem: it is easy to create an accidental Cartesian product. Take the following example:
SELECT a.Col1, b.Col2 FROM dbo.TableA a, dbo.TableB b, dbo.TableC c WHERE a.JoinColumn = b.JoinColumn AND a.ColX > 300 AND c.ColY < 50;
In this case, I meant to join three tables together on foreign keys on table A. Unfortunately, I got distracted and forgot the part of the WHERE clause which specified the join condition. As a result, I get back spurious results, airplanes fall out of skies, and I’m roaming the Australian outback killing leather-bound bikers for precious gasoline.
In contrast, here is the newer version:
SELECT a.Col1, b.Col2 FROM dbo.TableA a INNER JOIN dbo.TableB b ON a.JoinColumn = b.JoinColumn INNER JOIN dbo.TableC c ON a.JoinColumn2 = c.JoinColumn2 WHERE a.ColX > 300 AND c.ColY < 50;
In this case, if I forgot the join criteria for table C, I would get a syntax error rather than an incorrect result set. This is absolutely preferable for me, especially because I might be working in a development environment with a trivial data set, so the above bug might make it into production because the development environment did not have enough rows for the bug to surface. In addition to that, I also get the benefit of separating my business criteria (in the WHERE clause) from my “plumbing” criteria in the JOIN clause. Keeping these separate makes the code easier to understand and later modify.
It is possible to mess up an INNER join and create an accidental Cartesian join: joining a.Col = a.Col when you meant b.Col will result in syntactically correct but invalid SQL. This is a lot more difficult to pull off, though, and is a testament to how great INNER/OUTER/CROSS JOIN is compared to = and *=.
Man..couldn’t disagree more. The old school can accomplish more with much less code. Oh well.