Joins
You can use thejoins parameter within cubes to define joins to other cubes.
Joins allow to access and compare members from two or more cubes at the same time.
LEFT JOIN. The cube which defines the join serves
as a main table, and any cubes referenced inside the joins property are used
in the LEFT JOIN clause. Learn more about direction of joins
here.
The semantics of INNER JOIN can be achieved with additional filtering. For
example, a simple check of whether the column value IS NOT NULL by using set
filter satisfies this requirement.
There’s also no way to define FULL OUTER JOIN and RIGHT OUTER JOIN for the
sake of join modeling simplicity. To get RIGHT OUTER JOIN semantics just
define join from other side of relationship.
The FULL OUTER JOIN can be built inside cube sql
parameter. Quite frequently, FULL OUTER JOIN is used to solve Data
Blending or similar problems. In that case, it’s best
practice to have a separate cube for such an operation.
Parameters
name
The name must match the name of the joined cube and, thus, follow the naming conventions. For example, when theproducts cube is joined on to the orders cube, we
would define the join as follows:
relationship
Therelationship property is used to describe the type of the relationship
between joined cubes. It’s important to properly define the type of relationship
so Cube can accurately calculate measures.
The cube that declares the join is considered left in terms of the left
join semantics, and the joined cube is considered right. It
means that all rows of the left cube are selected, while only those rows of
the right cube that match the condition are selected as well. For more
information and specific examples, please see join
directions.
The join does not need to be defined on both cubes, but the definition can
affect the join direction.
one_to_onefor one-to-one relationshipsone_to_manyfor one-to-many relationshipsmany_to_onefor the opposite of one-to-many relationships
One-to-one
Theone_to_one type indicates a one-to-one relationship between
the declaring cube and the joined cube. It means that one row in the declaring
cube can match only one row in the joined cube.
For example, in a data model containing users and their profiles, the
users cube would declare the following join:
One-to-many
Theone_to_many type indicates a one-to-many relationship between
the declaring cube and the joined cube. It means that one row in the declaring
cube can match many rows in the joined cube.
For example, in a data model containing authors and the books they have
written, the authors cube would declare the following join:
Many-to-one
Themany_to_one type indicates the many-to-one relationship between the
declaring cube and the joined cube. You’ll often find this type of relationship
on the opposite side of the one-to-many relationship. It means that
one row in the declaring cube matches a single row in the joined cube, while a
row in the joined cube can match many rows in the declaring cube.
For example, in a data model containing orders and customers who made them,
the orders cube would have the following join:
sql
sql is necessary to indicate a related column between cubes. It is important
to properly specify a matching column when creating joins. Take a look at the
example below:
Setting a primary key
In order for a join to work, it is necessary to define aprimary_key as
specified below. It is a requirement when a join is defined so that Cube can
handle row multiplication issues such as chasm and fan traps.
Let’s imagine you want to calculate Order Amount by Order Item Product Name.
In this case, Order rows will be multiplied by the Order Item join due to
the one_to_many relationship. In order to produce correct results, Cube will
select distinct primary keys from Order first and then will join these primary
keys with Order to get the correct Order Amount sum result. Please note that
primary_key should be defined in the dimensions section.
Setting
primary_key to true will change the default value of the public
parameter to false. If you still want public to be true — set it manually.The example uses Postgres string concatenation; note that SQL may be different
depending on your database.
Chasm and fan traps
Cube automatically detects chasm and fan traps based on themany_to_one and one_to_many relationships defined in join.
When detected, Cube generates a deduplication query that evaluates all distinct primary keys within the multiplied measure’s cube and then joins distinct primary keys to this cube on itself to calculate the aggregation result.
If there’s more than one multiplied measure in a query, then such query is generated for every such multiplied measure, and results are joined.
Cube solves for chasm and fan traps during query time.
If there’s pre-aggregregation that fits measure multiplication requirements it’d be leveraged to serve such a query.
Such pre-aggregations and queries are always considered non-additive for the purpose of pre-aggregation matching.
Let’s consider an example data model:
customers.average_age by orders.city, the Cube detects that the average_age measure in the customers cube would be multiplied by orders to customers and would generate SQL similar to:
CUBE reference
When you have several joined cubes, you should accurately use columns’ names to avoid any mistakes. One way to make no mistakes is to use theCUBE reference.
It allows you to specify columns’ names in cubes without any ambiguity. During
the implementation of the query, this reference will be used as an alias for a
basic cube. Take a look at the following example:
Transitive joins
Cube automatically takes care of transitive joins. For example, consider the following data model:a → b and b → c will be resolved automatically. Cube uses the
Dijkstra algorithm to find a join path between cubes given
requested members.
In case there are multiple join paths that can be used to join the same set of cubes, Cube will collect cube names from members in the following order:
- Measures
- Dimensions
- Segments
- Time dimensions