Yesterday’s anti-pattern ended with me talking about performance. The rest of my anti-patterns have to do primarily with performance. The first entry in this category is the king of poorly-performing solutions: the Entity-Attribute-Value model. The single best presentation of the EAV model and its constituent failings is this one by Phil Factor. What follows is my quick presentation of the problem.
Developers love to have clockwork systems in which they perform the work once and don’t have to deal with their systems afterward. It’s a laudable goal—after all, who wants to deal constantly with the same problems over and over? Unfortunately, the most difficult part of systems design from this perspective is dealing with databases. You specify your entities and attributes, but when the business changes requirements, you have to modify tables. Then there’s the problem that sometimes, people don’t know beforehand what attributes they want to collect and analyze. As a developer, you go to your thinking place and try to come up with a workable system which doesn’t require that you constantly add and move columns around on tables. After a spark of genius, it hits you: the One True Lookup Table:
CREATE TABLE dbo.ItemAttribute ( ItemAttributeID int identity(1,1) NOT NULL, ItemID int NOT NULL, Name varchar(100) NOT NULL, Value varchar(max) NOT NULL ); ALTER TABLE dbo.ItemAttribute ADD CONSTRAINT [PK_ItemAttribute] PRIMARY KEY CLUSTERED(ID); ALTER TABLE dbo.ItemAttribute ADD CONSTRAINT [UKC_ItemAttribute] UNIQUE (ItemID, Name); ALTER TABLE dbo.ItemAttribute ADD CONSTRAINT [FK_ItemAttribute_Item] FOREIGN KEY (ItemID) REFERENCES dbo.Item(ItemID);
Your annoying DBA can’t complain about this; it’s normalized, you have good constraints, and your fields are exactly what they say. In addition, you only need this one table for all item attributes. Best of all, your job is done: if they want to collect color, they can collect color; if they want to see size, they can see size; if they want number of units for one item and washing instructions for another, you just need this one table for everything. After declaring yourself Super-Genius of the Century and implementing your system, you start daydreaming about the Ferrari you’re going to buy with the hefty bonus management has to give you when you get interrupted by the phone. This was management alright, but not to tell you to come collect your giant check; rather, it’s them telling you that your system is dog-slow.
That’s the fundamental problem with EAV: inserting data is easy, but retrieving the data can be a mess. Let’s take a fairly basic business request: for all clothing-related items, get the manufacturer’s name, the product color, the product size, and the country of origin. But we only want to get the records which cost at least $14 per unit. All of these are item attributes; the Item entity is basically a stub with a couple of common properties, but because the company handles clothes, electronics, jewelry, condos, and landscaping services, you decided that all of the non-common attributes go into ItemAttribute.
Here’s what the above query looks like:
SELECT i.ID as ItemID, i.Name as ItemName, COALESCE(iam.Value, 'UNKNOWN') as ManufacturerName, COALESCE(iac.Value, 'UNKNOWN') as ItemColor, COALESCE(ias.Value, 'UNKNOWN') as ProductSize, COALESCE(iao.Value, 'UNKNOWN') as CountryOfOrigin FROM dbo.Item i LEFT OUTER JOIN dbo.ItemAttribute iam ON i.ItemID = iam.ItemID AND iam.Name = 'Manufacturer' LEFT OUTER JOIN dbo.ItemAttribute iac ON i.ItemID = iac.ItemID AND iac.Name = 'Color' LEFT OUTER JOIN dbo.ItemAttribute ias ON i.ItemID = ias.ItemID AND ias.Name = 'Size' LEFT OUTER JOIN dbo.ItemAttribute iao ON i.ItemID = iao.ItemID AND iao.Name = 'Country Of Origin' INNER JOIN dbo.ItemAttribute iapt ON i.ItemID = iapt.ItemID AND iapt.Name = 'Product Type' INNER JOIN dbo.ItemAttribute iamc ON i.ItemID = iamc.ItemID AND iamc.Name = 'Unit Cost' WHERE iapt.Value = 'Clothing' AND CAST(iamc.Value AS DECIMAL(8,2)) >= 14;
This is a monster of a query. We have to scan the ItemAttribute table six times just for a simple query! To make matters worse, indexing this query is next to impossible because our Value attribute is a VARCHAR(MAX) type, meaning that it’s not going on any indexes. This means that your annoying DBA can’t do anything easy to make your query go faster.
Aside from terrible performance, there are several other problems with the Entity-Attribute-Value model. First of all, there is no real data integrity. You created meaningful constraints, but you’re not constraining the one part which needs it: Value. In one case, we’re casting Value as a decimal type to perform a comparison. What happens if somebody accidentally puts in a non-decimal value? ”Okay,” the developer may think, “I can just create a few columns: IntValue, ShortVarcharValue, VarcharValue, DecimalValue, etc.” If this sounds absurd, it should. You’re mixing metadata (attribute type) with data (the attribute itself).
Aside from that data integrity problem, we also have a referential integrity problem. With the Name column, one person could use “Color” and another “Colour.” Different spellings, misspellings, and rephrasing product attributes happens. You could create a lookup table called ItemAttributeName which lists valid values, but now you have a half-dozen more joins.
The EAV model is great for mockups and systems with a few rows of data. When it comes to handling a serious system, however, you need a good data model. There’s just no easy way around it.