Getting started with data modeling
The data model is used to transform raw data into meaningful business definitions and pre-aggregate data for optimal results. The data model is exposed through a rich set of APIs that allows end-users to run a wide variety of analytical queries without modifying the data model itself.You can explore a carefully crafted sample data model if you create a demo
deployment in Cube Cloud.
| id | paying | city | company_name |
|---|---|---|---|
| 1 | true | San Francisco | Pied Piper |
| 2 | true | Palo Alto | Raviga |
| 3 | true | Redwood | Aviato |
| 4 | false | Mountain View | Bream-Hall |
| 5 | false | Santa Cruz | Hooli |
- How many users do we have?
- How many paying users?
- What is the percentage of paying users out of the total?
- How many users, paying or not, are from different cities and companies?
1. Creating a Cube
In Cube, cubes are used to organize entities and connections between entities. Usually one cube is created for each table in the database, such asusers, orders, products, etc. In the sql_table parameter of the
cube we define a base table for this cube. In our case, the base table is simply
our users table.
2. Adding Measures and Dimensions
Once the base table is defined, the next step is to add measures and dimensions to the cube.Measures are referred to as quantitative data, such as number of units sold,
number of unique visits, profit, and so on.Dimensions are referred to as categorical data, such as state, gender,
product name, or units of time (e.g., day, week, month).
sql_table property), we create a count measure
in the measures block. The count type and sql
id means that when this measure will be requested via an API, Cube will
generate and execute the following SQL:
GROUP BY clause:
3. Adding Filters to Measures
Now let’s answer the next question – “How many paying users do we have?”. To accomplish this, we will introduce measure filters:It is best practice to prefix references to table columns with the name of the
cube or with the
CUBE constant when referencing the current cube’s column.paying_count measure, which shows only our paying
users. When this measure is requested, Cube will generate the following SQL:
filters property is an array, you can apply as many filters as
required. paying_count can be used with dimensions the same way as a simple
count. We can group paying_count by city and companyName simply by
adding these dimensions alongside measures in the requested query.
4. Using Calculated Measures
To answer “What is the percentage of paying users out of the total?”, we need to calculate the paying users ratio, which is basicallypaying_count / count.
Cube makes it extremely easy to perform this kind of calculation by defining a
calculated measure. Let’s add a new measure to our cube
called paying_percentage:
paying_percentage, which divides
paying_count by count. This example shows how you can reference measures
inside other measure definitions. When you request the paying_percentage
measure via an API, the following SQL will be generated:
paying_percentage can be used with dimensions.