Accelerating non-additive measures

Use case

We want to run queries against pre-aggregations only to ensure our application’s superior performance. Usually, accelerating a query is as simple as including its measures and dimensions to the pre-aggregation definition. Non-additive measures (e.g., average values or distinct counts) are a special case. Pre-aggregations with such measures are less likely to be selected to accelerate a query. However, there are a few ways to work around that.

Data modeling

Let’s explore the users cube that contains various measures describing users’ age:

count of unique age values (distinct_ages)
average age (avg_age)
90th percentile of age (p90_age)

cubes:
  - name: users
    # ...

    measures:
      - name: distinct_ages
        sql: age
        type: count_distinct

      - name: avg_age
        sql: age
        type: avg

      - name: p90_age
        sql: PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY age)
        type: number

All of these measures are non-additive. Practically speaking, it means that the pre-aggregation below would only accelerate a query that fully matches its definition:

cubes:
  - name: users

    pre_aggregations:
      - name: main
        measures:
          - distinct_ages
          - avg_age
          - p90_age
        dimensions:
          - gender

This query will match the pre-aggregation above and, thus, will be accelerated:

{
  "measures": ["users.distinct_ages", "users.avg_age", "users.p90_age"],
  "dimensions": ["users.gender"]
}

Meanwhile, the query below won’t match the same pre-aggregation because it contains non-additive measures and omits the gender dimension. It won’t be accelerated:

{
  "measures": ["users.distinct_ages", "users.avg_age", "users.p90_age"]
}

Let’s explore some possible workarounds.

Replacing with approximate additive measures

Often, non-additive count_distinct measures can be changed to have the count_distinct_approx type which will make them additive and orders of magnitude more performant. This count_distinct_approx measures can be used in pre-aggregations. However, there are two drawbacks:

This type is approximate, so the measures might yield slightly different results compared to their count_distinct counterparts. Please consult with your database’s documentation to learn more.
The count_distinct_approx is not supported with all databases. Currently, Cube supports it for Athena, BigQuery, and Snowflake.

For example, the distinct_ages measure can be rewritten as follows:

cubes:
  - name: users

    measures:
      - name: distinct_ages
        sql: age
        type: count_distinct_approx

Decomposing into a formula with additive measures

Non-additive avg measures can be rewritten as calculated measures that reference additive measures only. Then, this additive measures can be used in pre-aggregations. Please note, however, that you shouldn’t include avg_age measure in your pre-aggregation as it renders it non-additive. For example, the avg_age measure can be rewritten as follows:

cubes:
  - name: users

    measures:
      - name: avg_age
        sql: "{age_sum} / {count}"
        type: number

      - name: age_sum
        sql: age
        type: sum

      - name: count
        type: count

    pre_aggregations:
      - name: main
        measures:
          - age_sum
          - count
        dimensions:
          - gender

Providing multiple pre-aggregations

If the two workarounds described above don’t apply to your use case, feel free to create additional pre-aggregations with definitions that fully match your queries with non-additive measures. You will get a performance boost at the expense of a slightly increased overall pre-aggregation build time and space consumed.

Source code

Please feel free to check out the full source code or run it with the docker-compose up command. You’ll see the result, including queried data, in the console.

Getting Started

Data Modeling

Caching

Integrations

Explore & Analyze

Present & Share

Getting Started

Deployment

Migrate to Cube Cloud

Accelerating non-additive measures

Accelerating non-additive measures

Use case

Data modeling

Replacing with approximate additive measures

Decomposing into a formula with additive measures

Providing multiple pre-aggregations

Source code

Getting Started

Data Modeling

Caching

Integrations

Explore & Analyze

Present & Share

Getting Started

Deployment

Migrate to Cube Cloud

​Accelerating non-additive measures

​Use case

​Data modeling

​Replacing with approximate additive measures

​Decomposing into a formula with additive measures

​Providing multiple pre-aggregations

​Source code

Accelerating non-additive measures

Use case

Data modeling

Replacing with approximate additive measures

Decomposing into a formula with additive measures

Providing multiple pre-aggregations

Source code