Yesterday’s anti-pattern (nested views) was part one of the “Do Repeat Yourself [Sometimes]” mini-series. Today, I’d like to get into part two of the series: user-defined functions. When Microsoft introduced user-defined functions in SQL Server 2000, it was an object-oriented developer’s dream come true. It showed that Microsoft wanted to make true encapsulation—and not just views—a reality. You would be able to define your business logic in functions (either scalar functions to perform calculations or return result sets with table-valued functions), and have everything in just one location. The idea was fantastic, but the implementation was…problematic.
The biggest problem with user-defined functions is that they simply don’t perform well. With a scalar function, you’re performing the operation for every row in a result set. If this is a quick calculation that doesn’t require anything more than a bit of CPU and memory (i.e., all of the fields are already defined as part of the result set and you’re just performing mathematical or string operations on them) and you aren’t filtering by the function, a scalar function won’t kill you. But if you need to hit a table inside your scalar function, you’re going to be accessing that table once for every row in your result set. That will kill you. Even worse is if you make the scalar function part of your WHERE clause. In that case, you need to perform the operation for every single row, regardless of whether that row makes it into your final result set. In other words, scalar user-defined functions are not SARGable.
Table-valued functions can be better (as the Dieter post above notes), but even they will come with a performance cost over inlining the function’s code in your queries. Here is an older article (from the SQL Server 2000 days) which shows what has been my typical experience: inline TVFs are much faster than scalar or multi-table functios, but tend to have higher I/O requirements than the raw queries. This makes inline TVFs potentially a reasonable solution, but they’re something you need to test in a production-sized environment before using.
So now that we’re throwing out scalar and multi-table functions, what’s left? If you need a result set back, an inline TVF can work, as we mentioned. But if you need to perform some complex calculations against a result set, I’d recommend using the CLR. The CLR is faster for a number of scenarios, such as string splitting or calculating a median. It’s still a little scary to DBAs who are insulated from the general .NET world, but it’d be a shame if people miss out on such an important tool nearly a decade later just because they don’t know its value.