Deploying Cube Core with Docker
This guide walks you through deploying Cube with Docker.If you’d like to deploy Cube to Kubernetes, please
refer to the following resources with Helm charts:
gadsme/charts or
OpstimizeIcarus/cubejs-helm-charts-kubernetes.These resources are community-maintained, and they are not maintained by the
Cube team. Please direct questions related to these resources to their authors.Prerequisites
Configuration
Create a Docker Compose stack by creating adocker-compose.yml. A
production-ready stack would at minimum consist of:
- One or more Cube API instance
- A Cube Refresh Worker
- A Cube Store Router node
- One or more Cube Store Worker nodes
Using macOS or Windows? Use
CUBEJS_DB_HOST=host.docker.internal instead of
localhost if your database is on the same machine.Using macOS on Apple Silicon (arm64)? Use the
arm64v8 tag for Cube Store
Docker images,
e.g., cubejs/cubestore:arm64v8.Note that it’s a best practice to use specific locked versions, e.g.,
cubejs/cube:v0.36.0, instead of cubejs/cube:latest in production.Set up reverse proxy
In production, the Cube API should be served over an HTTPS connection to ensure security of the data in-transit. We recommend using a reverse proxy; as an example, let’s use NGINX.You can also use a reverse proxy to enable HTTP 2.0 and GZIP compression
nginx/cube.conf:
ssl directory with the cert.pem and key.pem files
inside so the Nginx service can find them.
For automatically provisioning SSL certificates with LetsEncrypt, this blog
post may be useful.
Security
Use JSON Web Tokens
Cube can be configured to use industry-standard JSON Web Key Sets for securing its API and limiting access to data. To do this, we’ll define the relevant options on our Cube API instance:Securing Cube Store
All Cube Store nodes (both router and workers) should only be accessible to Cube API instances and refresh workers. To do this with Docker Compose, we simply need to make sure that none of the Cube Store services have any exposedMonitoring
All Cube logs can be found by through the Docker Compose CLI:Update to the latest version
Find the latest stable release version from Docker Hub. Then update yourdocker-compose.yml to use
a specific tag instead of latest:
Extend the Docker image
If you need to use dependencies (i.e., Python or npm packages) with native extensions inside configuration files or dynamic data models, build a custom Docker image. You can do this by creating aDockerfile and a corresponding
.dockerignore file:
Dockerfile:
.dockerignore:
docker-compose.yml to use your newly-built image:
.:/cube/conf)
if you have dependencies in package.json. Doing so would effectively
hide the node_modules folder inside the container, where dependency files
installed with npm install reside, and result in errors like this:
Error: Cannot find module 'my_dependency'. In that case, mount individual files:
Production checklist
Thinking of migrating to the cloud instead? Click
here to learn more about migrating a self-hosted
installation to Cube Cloud.
Disable Development Mode
When running Cube in production environments, make sure development mode is disabled both on API Instances and Refresh Worker. Running Cube in development mode in a production environment can lead to security vulnerabilities. Enabling Development Mode in Cube Cloud is not recommended. Development Mode will expose your data to the internet. You can read more on the differences between production and development mode here.Development mode is disabled by default.
Set up Refresh Worker
To refresh in-memory cache and pre-aggregations in the background, we recommend running a separate Cube Refresh Worker instance. This allows your Cube API Instance to continue to serve requests with high availability.Set up Cube Store
Cube Store manages in-memory cache, queue and pre-aggregations for Cube. Follow the instructions here to set it up. Depending on your database, Cube may need to “stage” pre-aggregations inside your database first before ingesting them into Cube Store. In this case, Cube will require write access to a dedicated schema inside your database. The schema name isprod_pre_aggregations by default. It can be set using the
pre_aggregations_schema configration option.
You may consider enabling an export bucket which allows Cube to build large
pre-aggregations in a much faster manner. It is currently supported for
BigQuery, Redshift, Snowflake, and some other data sources. Check the relevant
documentation for your configured database to set it up.
Secure the deployment
If you’re using JWTs, you can configure Cube to correctly decode them and inject their contents into the Security Context. Add your authentication provider’s configuration under thejwt property of your cube.js
configuration file, or if using environment variables, see
CUBEJS_JWK_*, CUBEJS_JWT_* in the Environment Variables
reference.
Set up health checks
Cube provides Kubernetes-API compatible health check (or probe) endpoints that indicate the status of the deployment. Configure your monitoring service of choice to use the/readyz and
/livez API endpoints so you can check on the Cube
deployment’s health and be alerted to any issues.
Appropriate cluster sizing
There’s no one-size-fits-all when it comes to sizing a Cube cluster and its resources. Resources required by Cube significantly depend on the amount of traffic Cube needs to serve and the amount of data it needs to process. The following sizing estimates are based on default settings and are very generic, which may not fit your Cube use case, so you should always tweak resources based on consumption patterns you see.Memory and CPU
Each Cube cluster should contain at least 2 Cube API instances. Every Cube API instance should have at least 3GB of RAM and 2 CPU cores allocated for it. Refresh workers tend to be much more CPU and memory intensive, so at least 6GB of RAM is recommended. Please note that to take advantage of all available RAM, the Node.js heap size should be adjusted accordingly by using the--max-old-space-size option:
RPS and data volume
Depending on data model size, every Core Cube API instance can serve 1 to 10 requests per second. Every Core Cube Store router node can serve 50-100 queries per second. As a rule of thumb, you should provision 1 Cube Store worker node per one Cube Store partition or 1M of rows scanned in a query. For example if your queries scan 16M of rows per query, you should have at least 16 Cube Store worker nodes provisioned. Please note that the number of raw data rows doesn’t usually equal the number of rows in pre-aggregation. At the same time, queries don’t usually scan all the data in pre-aggregations, as Cube Store uses partition pruning to optimize queries.EXPLAIN ANALYZE can be used to see
scanned partitions involved in a Cube Store query. Cube Cloud ballpark
performance numbers can differ as it has different Cube runtime.
Optimize usage
See this recipe to learn how to optimize
data source usage.