Refreshing select partitions
Use case
We have a dataset with orders and we want to aggregate data while having decent performance. Orders have a creation time, so we can use partitioning by time to optimize pre-aggregations build and refresh time. The problem is that the order’s status can change after a long period. In this case, we want to rebuild only partitions associated with this order. In the recipe below, we’ll learn how to use therefresh_key together
with the FITER_PARAMS
for partition separately.
Data modeling
Let’s explore theorders cube data that contains various information about
orders, including number and status:
| id | number | status | created_at | updated_at |
|---|---|---|---|---|
| 1 | 1 | processing | 2021-08-10 14:26:40 | 2021-08-10 14:26:40 |
| 2 | 2 | completed | 2021-08-20 13:21:38 | 2021-08-22 13:10:38 |
| 3 | 3 | shipped | 2021-09-01 10:27:38 | 2021-09-02 01:12:38 |
| 4 | 4 | completed | 2021-09-20 10:27:38 | 2021-09-20 10:27:38 |
created_at and updated_at properties. The
updated_at property is the last order update timestamp. To create a
pre-aggregation with partitions, we need to specify the
partition_granularity property.
Partitions will be split monthly by the created_at dimension.
refresh_key that will
check for new values of the updated_at property. The refresh key is evaluated
for each partition separately. For example, if we update orders from August and
update their updated_at property, the current refresh key will update for
all partitions. There is how it looks in the Cube logs:
FILTER_PARAMS for that!
Let’s update our pre-aggregation definition:
created_at property and then apply the refresh
key for the updated_at property. Here’s how it looks in the Cube logs:
created_at property. With this refresh key, only one partition will be
updated.
Result
We have received orders from two partitions of a pre-aggregation and only one of them has been updated when an order changed its status:Source code
Please feel free to check out the full source code or run it with thedocker-compose up command. You’ll see the result, including
queried data, in the console.