Accelerating non-additive measures
Use case
We want to run queries against pre-aggregations only to ensure our application’s superior performance. Usually, accelerating a query is as simple as including its measures and dimensions to the pre-aggregation definition. Non-additive measures (e.g., average values or distinct counts) are a special case. Pre-aggregations with such measures are less likely to be selected to accelerate a query. However, there are a few ways to work around that.Data modeling
Let’s explore theusers cube that contains various measures describing users’
age:
- count of unique age values (
distinct_ages) - average age (
avg_age) - 90th percentile of age (
p90_age)
gender dimension. It won’t be
accelerated:
Replacing with approximate additive measures
Often, non-additivecount_distinct measures can be changed to have the
count_distinct_approx type
which will make them additive and orders of magnitude more performant. This
count_distinct_approx measures can be used in pre-aggregations. However, there
are two drawbacks:
- This type is approximate, so the measures might yield slightly different
results compared to their
count_distinctcounterparts. Please consult with your database’s documentation to learn more. - The
count_distinct_approxis not supported with all databases. Currently, Cube supports it for Athena, BigQuery, and Snowflake.
distinct_ages measure can be rewritten as follows:
Decomposing into a formula with additive measures
Non-additiveavg measures can be rewritten as calculated measures
that reference additive measures only. Then, this additive measures can be used
in pre-aggregations. Please note, however, that you shouldn’t include avg_age
measure in your pre-aggregation as it renders it non-additive.
For example, the avg_age measure can be rewritten as follows:
Providing multiple pre-aggregations
If the two workarounds described above don’t apply to your use case, feel free to create additional pre-aggregations with definitions that fully match your queries with non-additive measures. You will get a performance boost at the expense of a slightly increased overall pre-aggregation build time and space consumed.Source code
Please feel free to check out the full source code or run it with thedocker-compose up command. You’ll see the result, including
queried data, in the console.