> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cube.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Running in production

> Cube makes use of two different kinds of cache:

* In-memory storage of query results
* Pre-aggregations

Cube Store is enabled by default when running Cube in development mode. In
production, Cube Store **must** run as a separate process. The easiest way to do
this is to use the official Docker images for Cube and Cube Store.

<Info>
  Using Windows? We **strongly** recommend using [WSL2 for Windows 10][link-wsl2]
  to run the following commands.
</Info>

You can run Cube Store with Docker with the following command:

```bash theme={"dark"}
docker run -p 3030:3030 cubejs/cubestore
```

<Info>
  Cube Store can further be configured via environment variables. To see a
  complete reference, please consult the `CUBESTORE_*` environment variables in
  the [Environment Variables reference][ref-config-env].
</Info>

Next, run Cube and tell it to connect to Cube Store running on `localhost` (on
the default port `3030`):

```bash theme={"dark"}
docker run -p 4000:4000 \
  -e CUBEJS_CUBESTORE_HOST=localhost \
  -v ${PWD}:/cube/conf \
  cubejs/cube
```

In the command above, we're specifying [`CUBEJS_CUBESTORE_HOST`](/reference/configuration/environment-variables#cubejs_cubestore_host) to let Cube know
where Cube Store is running.

You can also use Docker Compose to achieve the same:

```yaml theme={"dark"}
services:
  cubestore:
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_REMOTE_DIR=/cube/data
    volumes:
      - .cubestore:/cube/data

  cube:
    image: cubejs/cube:latest
    ports:
      - 4000:4000
    environment:
      - CUBEJS_CUBESTORE_HOST=localhost
    depends_on:
      - cubestore
    links:
      - cubestore
    volumes:
      - ./model:/cube/conf/model
```

## Architecture

<div style={{ textAlign: "center" }}>
  <img alt="Cube Store cluster with Cube" src="https://ucarecdn.com/28747c64-bc0e-4f2c-a438-e5a112c68436/" style={{ border: "none" }} width="100%" />
</div>

A Cube Store cluster consists of at least one Router and one or more Worker
instances. Cube sends queries to the Cube Store Router, which then distributes
the queries to the Cube Store Workers. The Workers then execute the queries and
return the results to the Router, which in turn returns the results to Cube.

## Scaling

<Warning>
  Cube Store *can* be run in a single instance mode, but this is usually
  unsuitable for production deployments. For high concurrency and data throughput,
  we **strongly** recommend running Cube Store as a cluster of multiple instances
  instead.
</Warning>

Scaling Cube Store for a higher concurrency is relatively simple when running in
cluster mode. Because [the storage layer](#storage) is decoupled from the query
processing engine, you can horizontally scale your Cube Store cluster for as
much concurrency as you require.

In cluster mode, Cube Store runs two kinds of nodes:

* one or more **router** nodes handle incoming client connections, manage
  database metadata and serve simple queries.
* multiple **worker** nodes which execute SQL queries

Cube Store querying performance is optimal when the count of partitions in a
single query is less than or equal to the worker count. For example, you have a
200 million rows table that is partitioned by day, which is ten daily Cube
partitions or 100 Cube Store partitions in total. The query sent by the user
contains filters, and the resulting scan requires reading 16 Cube Store
partitions in total. Optimal query performance, in this case, can be achieved
with 16 or more workers. You can use `EXPLAIN` and `EXPLAIN ANALYZE` SQL
commands to see how many partitions would be used in a specific Cube Store
query.

Resources required for the main node and workers can vary depending on the
configuration. With default settings, you should expect to allocate at least 4
CPUs and up to 8GB per main or worker node.

The configuration required for each node can be found in the table below. More
information about these variables can be found [in the Environment Variables
reference][ref-config-env].

| Environment Variable    | Specify on Router? | Specify on Worker? |
| ----------------------- | ------------------ | ------------------ |
| `CUBESTORE_SERVER_NAME` | ✅ Yes              | ✅ Yes              |
| `CUBESTORE_META_PORT`   | ✅ Yes              | —                  |
| `CUBESTORE_WORKERS`     | ✅ Yes              | ✅ Yes              |
| `CUBESTORE_WORKER_PORT` | —                  | ✅ Yes              |
| `CUBESTORE_META_ADDR`   | —                  | ✅ Yes              |

`CUBESTORE_WORKERS` and `CUBESTORE_META_ADDR` variables should be set with
stable addresses, which should not change. You can use stable DNS names and put
load balancers in front of your worker and router instances to fulfill stable
name requirements in environments where stable IP addresses can't be guaranteed.

<Info>
  To fully take advantage of the worker nodes in the cluster, we **strongly**
  recommend using [partitioned pre-aggregations][ref-caching-partitioning].
</Info>

A sample Docker Compose stack for the single machine setting this up might look
like:

```yaml theme={"dark"}
services:
  cubestore_router:
    restart: always
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_SERVER_NAME=cubestore_router:9999
      - CUBESTORE_META_PORT=9999
      - CUBESTORE_WORKERS=cubestore_worker_1:9001,cubestore_worker_2:9001
      - CUBESTORE_REMOTE_DIR=/cube/data
    volumes:
      - .cubestore:/cube/data
  cubestore_worker_1:
    restart: always
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_SERVER_NAME=cubestore_worker_1:9001
      - CUBESTORE_WORKER_PORT=9001
      - CUBESTORE_META_ADDR=cubestore_router:9999
      - CUBESTORE_WORKERS=cubestore_worker_1:9001,cubestore_worker_2:9001
      - CUBESTORE_REMOTE_DIR=/cube/data
    depends_on:
      - cubestore_router
    volumes:
      - .cubestore:/cube/data
  cubestore_worker_2:
    restart: always
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_SERVER_NAME=cubestore_worker_2:9001
      - CUBESTORE_WORKER_PORT=9001
      - CUBESTORE_META_ADDR=cubestore_router:9999
      - CUBESTORE_WORKERS=cubestore_worker_1:9001,cubestore_worker_2:9001
      - CUBESTORE_REMOTE_DIR=/cube/data
    depends_on:
      - cubestore_router
    volumes:
      - .cubestore:/cube/data
  cube:
    image: cubejs/cube:latest
    ports:
      - 4000:4000
    environment:
      - CUBEJS_CUBESTORE_HOST=cubestore_router
    depends_on:
      - cubestore_router
    volumes:
      - .:/cube/conf
```

## Replication and High Availability

The open-source version of Cube Store doesn't support replicating any of its
nodes. The router node and every worker node should always have only one
instance copy if served behind the load balancer or service address. Replication
will lead to undefined behavior of the cluster, including connection errors and
data loss. If any cluster node is down, it'll lead to a complete cluster outage.
If Cube Store replication and high availability are required, please consider
using Cube Cloud.

## Storage

Cube Store cluster uses both persistent and scratch storage.

### Persistent storage

Cube Store makes use of a separate storage layer for storing metadata as well as
for persisting pre-aggregations as Parquet files.

Cube Store can be configured to use either AWS S3, Google Cloud Storage (GCS), or
Azure Blob Storage as persistent storage. If desired, a local path on
the server can also be used in case all Cube Store cluster nodes are
co-located on a single machine.

<Info>
  Cube Store can only use one type of remote storage at the same time.
</Info>

<Warning>
  Cube Store requires strong consistency guarantees from an underlying distributed
  storage. AWS S3, Google Cloud Storage, and Azure Blob Storage are the only known
  implementations that provide them. Using other implementations in production is
  discouraged and can lead to consistency and data corruption errors.
</Warning>

<Note>
  Available on [Enterprise plan](https://cube.dev/pricing).
</Note>

<Info>
  As an additional layer on top of standard AWS S3, Google Cloud Storage (GCS), or
  Azure Blob Storage encryption, persistent storage can optionally use [Parquet
  encryption](#data-at-rest-encryption) for data-at-rest protection.
</Info>

A simplified example using AWS S3 might look like:

```yaml theme={"dark"}
services:
  cubestore_router:
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_SERVER_NAME=cubestore_router:9999
      - CUBESTORE_META_PORT=9999
      - CUBESTORE_WORKERS=cubestore_worker_1:9001
      - CUBESTORE_S3_BUCKET=<BUCKET_NAME_IN_S3>
      - CUBESTORE_S3_REGION=<BUCKET_REGION_IN_S3>
      - CUBESTORE_AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>
      - CUBESTORE_AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>

  cubestore_worker_1:
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_SERVER_NAME=cubestore_worker_1:9001
      - CUBESTORE_WORKER_PORT=9001
      - CUBESTORE_META_ADDR=cubestore_router:9999
      - CUBESTORE_WORKERS=cubestore_worker_1:9001
      - CUBESTORE_S3_BUCKET=<BUCKET_NAME_IN_S3>
      - CUBESTORE_S3_REGION=<BUCKET_REGION_IN_S3>
      - CUBESTORE_AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>
      - CUBESTORE_AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>
    depends_on:
      - cubestore_router
```

Note that you can’t use the same bucket as an export bucket and persistent
storage for Cube Store. It's recommended to use two separate buckets.

### Scratch storage

Separately from persistent storage, Cube Store requires local scratch space
to warm up partitions by downloading Parquet files before querying them.

By default, this folder should be mounted to `.cubestore/data` inside the
container and can be configured by `CUBESTORE_DATA_DIR` environment variable.
It is advised to use local SSDs for this scratch space to maximize querying
performance.

### AWS

Cube Store can retrieve security credentials from instance metadata
automatically. This means you can skip defining the
`CUBESTORE_AWS_ACCESS_KEY_ID` and `CUBESTORE_AWS_SECRET_ACCESS_KEY` environment
variables.

<Warning>
  Cube Store currently does not take the key expiration time returned from
  instance metadata into account; instead the refresh duration for the key is
  defined by `CUBESTORE_AWS_CREDS_REFRESH_EVERY_MINS`, which is set to `180` by
  default.
</Warning>

### Garbage collection

Cleanup isn’t done in export buckets; however, it's done in the persistent
storage of Cube Store. The default time-to-live (TTL) for orphaned
pre-aggregation tables is one day.

Refresh worker should be able to finish pre-aggregation refresh before
garbage collection starts. It means that all pre-aggregation partitions
should be built before any tables are removed.

#### Supported file systems

The garbage collection mechanism relies on the ability of the underlying file
system to report the creation time of a file.

If the file system does not support getting the creation time, you will see the
following error message in Cube Store logs:

```text theme={"dark"}
ERROR [cubestore::remotefs::cleanup] <pid:1>
error while getting created time for file "<name>.chunk.parquet":
creation time is not available for the filesystem
```

<Note>
  XFS is known to not support getting the creation time of a file.
  Please see [this issue](https://github.com/cube-js/cube/issues/7905#issuecomment-2504212623)
  for possible workarounds.
</Note>

## Security

### Authentication

Cube Store does not have any in-built authentication mechanisms. For this reason,
we recommend running your Cube Store cluster with a network configuration that
only allows access from the Cube deployment.

### Data-at-rest encryption

[Persistent storage](#persistent-storage) is secured using the standard AWS S3,
Google Cloud Storage (GCS), or Azure Blob Storage encryption.

Cube Store also provides optional data-at-rest protection by utilizing the
[modular encryption mechanism][link-parquet-encryption] of Parquet files in its
persistent storage. Pre-aggregation data is secured using the [AES cipher][link-aes]
with 256-bit keys. Data encyption and decryption are completely seamless to Cube
Store operations.

<Note>
  Available on the [Enterprise plan](https://cube.dev/pricing).
  Also requires the M [Cube Store Worker tier](/admin/account-billing/pricing#cube-store-worker-tiers).
</Note>

You can provide, rotate, or drop your own [customer-managed keys][ref-cmk] (CMK)
for Cube Store via the **Encryption Keys** page in Cube Cloud.

## Troubleshooting

### Heavy pre-aggregations

When building some pre-aggregations, you might encounter the following error:

```text theme={"dark"}
Error: Error during create table: CREATE TABLE <REDACTED>
Error: Query execution timeout after 10 min of waiting
```

It means that your pre-aggregation is too heavy and takes too long to build.

As a temporary solution, you can increase the timeout for all queies that Cube
runs against your data source via the [`CUBEJS_DB_QUERY_TIMEOUT`](/reference/configuration/environment-variables#cubejs_db_query_timeout) environment variable.

However, it is recommended that you optimize your pre-aggregations instead:

* Use an export bucket if your [data source][ref-data-sources] supports it. Cube will then load the pre-aggregation data in a much more efficient way.
* Use [partitions][ref-pre-agg-partitions]. Cube will then run a separate query to build each partition.
* Build pre-aggregations [incrementally][ref-pre-agg-incremental]. Cube will then build only the necessary partitions with each pre-aggregation refresh.
* Set an appropriate [build range][ref-pre-agg-build-range] if you don't need to query the whole date range. Cube will then include only the necessary data in the pre-aggregation.
* Check that your pre-aggregation includes only necessary dimensions. Each additional dimension usually increases the volume of the pre-aggregation data.
  * If you include a high cardinality dimension, Cube needs to store a lot of data in the pre-aggregation. For example, if you include the primary key into the pre-aggregation, Cube will effectively need to store a copy of the original table in the pre-aggregation, which is rarely useful.
  * If a single pre-aggregation is used by queries with different sets of dimensions, consider creating separate pre-aggregations for each set of dimensions. This way, Cube will only include necessary data in each pre-aggregation.
* Check if you have a heavy calculation in the [`sql` expression][ref-cube-sql] of your cubes (rather than a simple `sql_table` reference). If it's the case, you can build an additional [`original_sql` pre-aggregation][ref-pre-agg-original-sql] and [instruct][ref-pre-agg-use-original-sql] Cube to use it when building other pre-aggregations for this cube.

### MinIO

When using MinIO for persistent storage, you might encounter the following error:

```text theme={"dark"}
Error: Error during upload of <REDACTED>
File <REDACTED> can't be listed after upload.
Either there's Cube Store cluster misconfiguration,
or storage can't provide the required consistency.
```

Most likely, it happens because MinIO is not providing strong consistency guarantees,
as required for Cube Store's [persistent storage](#persistent-storage).
You can either try configuring MinIO to provide strong consistency or switch to
using S3 or GCS.

<Warning>
  The support for MinIO as persistent storage in Cube Store was contributed by the community.
  It's not supported by Cube or the vendor.
</Warning>

[link-wsl2]: https://docs.microsoft.com/en-us/windows/wsl/install-win10

[ref-caching-partitioning]: /docs/pre-aggregations/using-pre-aggregations#partitioning

[ref-config-env]: /reference/configuration/environment-variables

[link-parquet-encryption]: https://parquet.apache.org/docs/file-format/data-pages/encryption/

[link-aes]: https://en.wikipedia.org/wiki/Advanced_Encryption_Standard

[ref-cmk]: /admin/deployment/encryption-keys

[ref-data-sources]: /admin/connect-to-data/data-sources

[ref-pre-agg-partitions]: /docs/pre-aggregations/using-pre-aggregations#partitioning

[ref-pre-agg-incremental]: /reference/data-modeling/pre-aggregations#incremental

[ref-pre-agg-build-range]: /reference/data-modeling/pre-aggregations#build_range_start-and-build_range_end

[ref-cube-sql]: /reference/data-modeling/cube#sql

[ref-pre-agg-original-sql]: /reference/data-modeling/pre-aggregations#original_sql

[ref-pre-agg-use-original-sql]: /reference/data-modeling/pre-aggregations#use_original_sql_pre_aggregations
