orders and
line_items to illustrate the concepts throughout this page:
orders
| id | status | completed_at | created_at |
|---|---|---|---|
| 1 | completed | 2019-01-05 00:00:00.000000 | 2019-01-02 00:00:00.000000 |
| 2 | shipped | 2019-01-17 00:00:00.000000 | 2019-01-02 00:00:00.000000 |
| 3 | completed | 2019-01-27 00:00:00.000000 | 2019-01-02 00:00:00.000000 |
| 4 | shipped | 2019-01-09 00:00:00.000000 | 2019-01-02 00:00:00.000000 |
| 5 | processing | 2019-01-29 00:00:00.000000 | 2019-01-02 00:00:00.000000 |
line_items
| id | product_id | order_id | quantity | price | created_at |
|---|---|---|---|---|---|
| 1 | 31 | 1 | 1 | 275 | 2019-01-31 00:00:00.000000 |
| 2 | 49 | 2 | 6 | 248 | 2021-01-20 00:00:00.000000 |
| 3 | 89 | 3 | 6 | 197 | 2021-11-25 00:00:00.000000 |
| 4 | 71 | 4 | 8 | 223 | 2019-12-23 00:00:00.000000 |
| 5 | 64 | 5 | 5 | 75 | 2019-04-20 00:00:00.000000 |
See this recipe to learn about entity-first and
metrics-first approaches.
Cubes
Cubes represent datasets in Cube and are conceptually similar to views in SQL. Cubes are usually declared in separate files with one cube per file. Typically, a cube points to a single table in your data source using thesql_table property:
If you’re using dbt, see this recipe to streamline defining cubes
on top of dbt models.
sql property to accommodate more
complex SQL queries:
Cube supports data models that consist of thousands of cubes and views.
For large multi-tenancy configurations, e.g., with more than
100 tenants, consider using a multi-cluster deployment.
See the reference documentaton for the full list of cube parameters.
Views
Views sit on top of the data graph of cubes and create a facade of your whole data model with which data consumers can interact. They are useful for defining metrics, managing governance and data access, and controlling ambiguous join paths.orders view which includes select members
from base_orders, products, and users cubes:
See the reference documentaton for the full list of view parameters.
Folders
Optionally, members of a view can be organized into folders. Each folder would contain a subset of members of the view. Cube supports both flat and nested folder structures, which can be used with various visualization tools. If a specific tool does not support nested folders, they will be exposed to such a tool as an equivalent flat structure. Check APIs & Integrations for details on the nested folders support.Dimensions
Dimensions represent the properties of a single data point in the cube. Theorders table contains only dimensions, so representing them in the
orders cube is straightforward:
id dimension is defined as a primary key.
It is also possible to have more than one primary key dimension in a cube if
you’d like them all to be parts of a composite key.
The line_items table also has a couple of dimensions which can be
represented as follows:
See the reference documentaton for the full list of dimension parameters.
Dimension types
Dimensions can be of different types, e.g.,string, number, or time. Often,
data types in SQL are mapped to dimension types in the following way:
See the dimension type reference for details.
Time dimensions
Time-based properties are modeled using dimensions of thetime
type. They allow grouping the result set by a unit of
time (e.g., days, weeks, month, etc.), also known as the time dimension
granularity.
The following granularities are available by default for any time dimension:
year, quarter, month, week (starting on Monday), day, hour, minute,
second. You can also define custom granularities
and optionally expose them via proxy dimensions in case
you need to use weeks starting on Sunday, fiscal years, etc.
See the following recipes:
- For a custom granularity example.
- For a custom calendar example.
Measures
Measures represent the properties of a set of data points in the cube. To add a measure calledcount to our orders cube, for example, we can do the
following:
LineItems cube, we can also create a measure to sum up the total value
of line items sold:
See the reference documentaton for the full list of measure parameters.
Measure types
Measures can be of different types, e.g.,count, sum, or number. Often,
aggregate functions in SQL are mapped to measure types in the following way:
See the measure type reference for details.
See the following recipes:
- To learn how to define average and percentile measures,
- To learn how to calculate period-over-period changes.
Measure additivity
Additivity is a property of measures that detemines whether measure values, once calculated for a set of dimensions, can be further aggregated to calculate measure values for a subset of these dimensions. Measure additivity has an impact on pre-aggregation matching. Additivity of a measure depends on its type. Only measures with the following types are considered additive:count,
count_distinct_approx,
min,
max,
sum.
Measures with all other types are considered non-additive.
Example
Consider the following cube:city as a dimension and count and avg_age
as measures, we’ll get the following results:
| city | count | avg_age |
|---|---|---|
| Los Gatos | 1 | 20 |
| San Diego | 2 | 35 |
city dimension from the query, we’ll get the following
results:
| count | avg_age |
|---|---|
| 3 | 30 |
count measure that we’ve got for the second
query could have been calculated based on the results of the first one:
1 + 2 = 3. It explains why the count measure, having the count type, is
considered additive.
However, the value of the avg_age measure that we’ve got for the second query
can’t be calculated based on the results of the first one: there’s no way to
derive 30 from 20 and 35. This is why the avg_age measure, having the
avg type, is considered non-additive.
Leaf measures
Measures that do not reference other measures are considered leaf measures. By definition, all measures that only reference SQL columns and expressions are leaf measures. On the other hand, calculated measures might not necessarily be leaf measures because they can reference other measures. Whether a query contains only additive leaf measures has an impact on pre-aggregation matching.Joins
Joins define the relationships between cubes, which then allows accessing and comparing properties from two or more cubes at the same time. In Cube, all joins areLEFT JOINs.
An
INNER JOIN can be replicated with Cube; when making a Cube query, add a
filter for IS NOT NULL on the required column.line_items cube onto our
orders cube:
one_to_one, one_to_many, and
many_to_one) and a few other concepts such as the
direction of joins and transitive joins pitfalls.
See the reference documentaton for the full list of join parameters.
Segments
Segments are pre-defined filters that are kept within the data model instead of a Cube query. They help to simplify queries and make it easy to reuse common filters across a variety of queries. To add a segment which limits results to completed orders, we can do the following:See the reference documentaton for the full list of segment parameters.
Pre-aggregations
Pre-aggregations provide a powerful way to accelerate frequently used queries and keep the cache up-to-date. Within a data model, they are defined using thepre_aggregations property:
See the reference documentaton for the full list of pre-aggregation
parameters.