Implementing Entity-Attribute-Value Model (EAV)
Use case
We want to create a cube for a dataset which uses the Entity-Attribute-Value model (EAV). It stores entities in a table that can be joined to another table with numerous attribute-value pairs. Each entity is not guaranteed to have the same set of associated attributes, thus making the entity-attribute-value relation a sparse matrix. In the cube, we’d like every attribute to be modeled as a dimension.Data modeling
Let’s explore theusers cube that contains the entities:
users cube is joined with the orders cube to reflect that there might be
many orders associated with a single user. The orders remain in various
statuses, as reflected by the status dimension, and their creation dates are
available via the created_at dimension:
- the users serve as entities and they should be modeled with a cube
- order statuses serve as attributes and they should be modeled as dimensions
- the earliest creation dates for each status serve as attribute values and they will be modeled as dimension values
Static attributes
We already know that the following statuses are present in the dataset:completed, processing, and shipped. Let’s assume this set of statuses is
not going to change often.
Then, modeling the cube is as simple as defining a few joins (one join per
attribute):
Static attributes, DRY version
We can embrace the Don’t Repeat Yourself principle and eliminate the repetition by generating the cube definition dynamically based on the list of statuses. Let’s create a new JavaScript model so we can move all repeated code patterns into handy functions and iterate over statuses in relevant parts of the cube’s code.users_statuses_DRY cube is functionally identical to the
users_statuses_joins cube above. Querying this new cube would yield the same
data. However, there’s still a static list of statuses present in the cube’s
source code. Let’s work around that next.
Dynamic attributes
We can eliminate the list of statuses from the cube’s code by loading this list from an external source, e.g., the data source. Here’s the code from thefetch.js file that defines the fetchStatuses function that would load the
statuses from the database. Note that it uses the pg package (Node.js client
for Postgres) and reuses the credentials from Cube.
fetchStatuses function to load the list of
statuses. We will also wrap the cube definition with the asyncModule built-in
function that allows the data model to be created
dynamically.
users_statuses_dynamic cube is functionally identical to the
previously created cubes. So, querying this new cube would yield the same data
too.
Source code
Please feel free to check out the full source code or run it with thedocker-compose up command. You’ll see the result, including
queried data, in the console.