An overview of the EAV model as a solution to the problem of needing to define custom attributes on entities at runtime
The Problem
One problem that comes up often when working with relational databases is that one can’t dynamically alter the schema on the fly. Many software will eventually have users asking for a way to define custom attributes. They’ll want to add fields like billing rep and their own internal tracking id to their invoices, but the other department handles things differently and wants a different set of fields instead of those, and our company wants to sell this software to other companies that will want a completely different set of fields for their workflow. Oh and they want to be able to search, filter, and aggregate by those fields so they can find specific invoices, see all the invoices a particular billing rep handled, and how much $ that sales rep brought in but only on orders of a particular type.
We could try altering the schema on the fly to add/remove columns for each of these custom attributes – a tricky and error prone ordeal. But we’d end up needing to dynamically build our queries to select the correct custom attributes of an entity, and our tables would now have a large number of columns with rows having an absurd number of null fields if only a small number of attributes might apply to any one entity.
So in summary the problems we have with these requirements are:
- Properties on entities are not known ahead of time and need to be dynamically defined at run time
- Entities have a large total number of possible properties but only a relatively small number apply on any one entity which makes the relational model inefficient for storing, reading, and/or writing since a large number of values in every row would be left null.
Enter the EAV Data Model
In the EAV model the attributes of an entity (such as a product or employee) are stored as entity-attribute-value pairs. Which means only the applicable attributes for any one entity need to be stored. Since the attribute is simply a key or foreign key pointing to an attribute definition in another table, we can dynamically modify the attributes through common DB operations at run time without the need of modifying the DB schema itself.
The basic model can be layed out in 3 tables:
The Entities Table
The plain old business entities. Employees, Invoices, Products, etc.
Id | Name |
---|---|
1 | “Product1” |
The Attributes Table
This table contains the definitions for all the possible attributes the entities can have.
Id | Name | Type |
---|---|---|
1 | “Color” | “string” |
The Entity-Attribute-Values Table
Finally, this table tells us what attributes a particular entity has and the values for each of those attributes.
Entity Id | Attribute Id | Attribute Value |
---|---|---|
1 | 1 | “purple” |
With this model, we can define custom attributes for each entity at run time. That simply involves adding/removing entries to the attributes table. Setting the values for the attributes of an entity is accomplished by modifying entries in the A-V table. Any non-applicable attributes to an entity are simply omitted entries for in the A-V table.
Reading isn’t too bad either. The queries will be more verbose but usually won’t require any unusual SQL syntax. At worst, we’ll need to use pivots if we want to read each attribute as a column.
The Problems with the EAV Model
The EAV data model isn’t a silver bullet though. It has evident trade offs that lead to some even calling it an anti-pattern. The EAV model lacks many of the advantages of relational databases.
- Attributes are dynamically typed. Requires casting for comparisons and de/serializations in app code, and requires implementing your own data validations
- Querying becomes complicated, requiring pivots if one wishes to show the attributes as columns in the query results
- Filtering in queries requires a join to the Attribute-Values table for every filtering attribute
- The ability to have attributes that are foreign keys is lost and requires implementing an maintaining your own solution if “relating” attribute values to other data is needed
- Tools such as ORMs rarely have facilities to work with such a data model in a clean, brief way.
- Indexing the Attribute Value column of the Attribute-Values table becomes complicated or even impossible depending on the particular type it represents, and/or DB.
Sill the EAV model is usually the simpler solution. The above example was only the basic form of the model and it can be customized further (at the cost of additional complexity) to overcome some of its drawbacks and tailor it for the specific problem at hand. For instance, additional columns could be added to the A-V table for each type of value that can be described (int, string, etc.) to get a more strongly typed schema.
Alternatives
- Using JSON or XML columns to store the blob of attribute-value pairs. This is a simpler approach if one only needs to a dumb data store for the custom attributes. Querying and modifications will be more complex than in the eav model. Database engines like Postgres and SQL Server have features that might make working with these column types easier.
- Pros: Simple to implement
- Cons: Querying on custom attributes becomes difficult, must read/write the attribute-value blob of entities
- NoSQL DBs such as Cassandra, Redis, Mongo, etc. These database engines don’t tie you don’t to a concrete schema and allow you structure your data in any way you want at any time (and with all the potential problems that brings in). Depending on your particular problems, one of these databases might be the ideal choice of storage for your data. This is the more heavy-weight solution though and will make moving to a different solution very difficult.
- Pros: Potentially the best solution since this is one of the main problems NoSQL DBs focus on solving
- Cons: It’s an entire DB that you need to add to your software’s tech stack