T-SQL Anti-Patterns: EAV

Yesterday’s anti-pattern ended with me talking about performance.  The rest of my anti-patterns have to do primarily with performance.  The first entry in this category is the king of poorly-performing solutions:  the Entity-Attribute-Value model.  The single best presentation of the EAV model and its constituent failings is this one by Phil Factor.  What follows is my quick presentation of the problem.

Developers love to have clockwork systems in which they perform the work once and don’t have to deal with their systems afterward.  It’s a laudable goal—after all, who wants to deal constantly with the same problems over and over?  Unfortunately, the most difficult part of systems design from this perspective is dealing with databases.  You specify your entities and attributes, but when the business changes requirements, you have to modify tables.  Then there’s the problem that sometimes, people don’t know beforehand what attributes they want to collect and analyze.  As a developer, you go to your thinking place and try to come up with a workable system which doesn’t require that you constantly add and move columns around on tables.  After a spark of genius, it hits you:  the One True Lookup Table:

CREATE TABLE dbo.ItemAttribute
	ItemAttributeID int identity(1,1) NOT NULL,
	ItemID int NOT NULL,
	Name varchar(100) NOT NULL,
	Value varchar(max) NOT NULL
ALTER TABLE dbo.ItemAttribute ADD CONSTRAINT [UKC_ItemAttribute] UNIQUE (ItemID, Name);
ALTER TABLE dbo.ItemAttribute ADD CONSTRAINT [FK_ItemAttribute_Item] FOREIGN KEY (ItemID) REFERENCES dbo.Item(ItemID);

Your annoying DBA can’t complain about this; it’s normalized, you have good constraints, and your fields are exactly what they say.  In addition, you only need this one table for all item attributes.  Best of all, your job is done:  if they want to collect color, they can collect color; if they want to see size, they can see size; if they want number of units for one item and washing instructions for another, you just need this one table for everything.  After declaring yourself Super-Genius of the Century and implementing your system, you start daydreaming about the Ferrari you’re going to buy with the hefty bonus management has to give you when you get interrupted by the phone.  This was management alright, but not to tell you to come collect your giant check; rather, it’s them telling you that your system is dog-slow.

That’s the fundamental problem with EAV:  inserting data is easy, but retrieving the data can be a mess.  Let’s take a fairly basic business request:  for all clothing-related items, get the manufacturer’s name, the product color, the product size, and the country of origin.  But we only want to get the records which cost at least $14 per unit.  All of these are item attributes; the Item entity is basically a stub with a couple of common properties, but because the company handles clothes, electronics, jewelry, condos, and landscaping services, you decided that all of the non-common attributes go into ItemAttribute.

Here’s what the above query looks like:

	i.ID as ItemID,
	i.Name as ItemName,
	COALESCE(iam.Value, 'UNKNOWN') as ManufacturerName,
	COALESCE(iac.Value, 'UNKNOWN') as ItemColor,
	COALESCE(ias.Value, 'UNKNOWN') as ProductSize,
	COALESCE(iao.Value, 'UNKNOWN') as CountryOfOrigin
	dbo.Item i
	LEFT OUTER JOIN dbo.ItemAttribute iam
		ON i.ItemID = iam.ItemID
		AND iam.Name = 'Manufacturer'
	LEFT OUTER JOIN dbo.ItemAttribute iac
		ON i.ItemID = iac.ItemID
		AND iac.Name = 'Color'
	LEFT OUTER JOIN dbo.ItemAttribute ias
		ON i.ItemID = ias.ItemID
		AND ias.Name = 'Size'
	LEFT OUTER JOIN dbo.ItemAttribute iao
		ON i.ItemID = iao.ItemID
		AND iao.Name = 'Country Of Origin'
	INNER JOIN dbo.ItemAttribute iapt
		ON i.ItemID = iapt.ItemID
		AND iapt.Name = 'Product Type'
	INNER JOIN dbo.ItemAttribute iamc
		ON i.ItemID = iamc.ItemID
		AND iamc.Name = 'Unit Cost'
	iapt.Value = 'Clothing'
	AND CAST(iamc.Value AS DECIMAL(8,2)) >= 14;

This is a monster of a query. We have to scan the ItemAttribute table six times just for a simple query!  To make matters worse, indexing this query is next to impossible because our Value attribute is a VARCHAR(MAX) type, meaning that it’s not going on any indexes.  This means that your annoying DBA can’t do anything easy to make your query go faster.

Aside from terrible performance, there are several other problems with the Entity-Attribute-Value model.  First of all, there is no real data integrity.  You created meaningful constraints, but you’re not constraining the one part which needs it:  Value.  In one case, we’re casting Value as a decimal type to perform a comparison.  What happens if somebody accidentally puts in a non-decimal value?  “Okay,” the developer may think, “I can just create a few columns:  IntValue, ShortVarcharValue, VarcharValue, DecimalValue, etc.”  If this sounds absurd, it should.  You’re mixing metadata (attribute type) with data (the attribute itself).

Aside from that data integrity problem, we also have a referential integrity problem.  With the Name column, one person could use “Color” and another “Colour.”  Different spellings, misspellings, and rephrasing product attributes happens.  You could create a lookup table called ItemAttributeName which lists valid values, but now you have a half-dozen more joins.

The EAV model is great for mockups and systems with a few rows of data.  When it comes to handling a serious system, however, you need a good data model.  There’s just no easy way around it.

About these ads

One thought on “T-SQL Anti-Patterns: EAV

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s