DuckDB
DuckDB is an in-process SQL OLAP database management system, and has support for querying data in CSV, JSON and Parquet formats from an AWS S3-compatible blob storage. This means you can query data stored in AWS S3, Google Cloud Storage, or Cloudflare R2. You can also use theCUBEJS_DB_DUCKDB_DATABASE_PATH environment variable to
connect to a local DuckDB database.
Prerequisites
- A set of IAM credentials which allow access to the S3-compatible data source. Credentials are only required for private S3 buckets.
- The region of the bucket
- The name of a bucket to query data from
Setup
Manual
Add the following to a.env file in your Cube project:
Cube Cloud
In Cube Cloud, select DuckDB when creating a new deployment and fill in the required fields:If you are not using MotherDuck, leave the MotherDuck Token
field blank.
You can also explore how DuckDB works with Cube if you create a demo
deployment in Cube Cloud.
Environment Variables
| Environment Variable | Description | Possible Values | Required |
|---|---|---|---|
CUBEJS_DB_DUCKDB_MEMORY_LIMIT | The maximum memory limit for DuckDB. Equivalent to SET memory_limit=<MEMORY_LIMIT>. Default is 75% of available RAM | A valid memory limit | ❌ |
CUBEJS_DB_DUCKDB_SCHEMA | The default search schema | A valid schema name | ❌ |
CUBEJS_DB_DUCKDB_MOTHERDUCK_TOKEN | The service token to use for connections to MotherDuck | A valid MotherDuck service token | ❌ |
CUBEJS_DB_DUCKDB_DATABASE_PATH | The database filepath to use for connection to a local database. | A valid duckdb database file path | ❌ |
CUBEJS_DB_DUCKDB_S3_ACCESS_KEY_ID | The Access Key ID to use for database connections | A valid Access Key ID | ❌ |
CUBEJS_DB_DUCKDB_S3_SECRET_ACCESS_KEY | The Secret Access Key to use for database connections | A valid Secret Access Key | ❌ |
CUBEJS_DB_DUCKDB_S3_ENDPOINT | The S3 endpoint | A valid S3 endpoint | ❌ |
CUBEJS_DB_DUCKDB_S3_REGION | The region of the bucket | A valid AWS region | ❌ |
CUBEJS_DB_DUCKDB_S3_USE_SSL | Use SSL for connection | A boolean | ❌ |
CUBEJS_DB_DUCKDB_S3_URL_STYLE | To choose the S3 URL style(vhost or path) | vhost or path | ❌ |
CUBEJS_DB_DUCKDB_S3_SESSION_TOKEN | The token for the S3 session | A valid Session Token | ❌ |
CUBEJS_DB_DUCKDB_EXTENSIONS | A comma-separated list of DuckDB extensions to install and load | A comma-separated list of DuckDB extensions | ❌ |
CUBEJS_DB_DUCKDB_COMMUNITY_EXTENSIONS | A comma-separated list of DuckDB community extensions to install and load | A comma-separated list of DuckDB community extensions | ❌ |
CUBEJS_DB_DUCKDB_S3_USE_CREDENTIAL_CHAIN | A flag to use credentials chain for secrets for S3 connections | true, false. Defaults to false | ❌ |
CUBEJS_CONCURRENCY | The number of concurrent queries to the data source | A valid number | ❌ |
Pre-Aggregation Feature Support
count_distinct_approx
Measures of typecount_distinct_approx can
be used in pre-aggregations when using DuckDB as a source database. To learn
more about DuckDB’s support for approximate aggregate functions, click
here.
Pre-Aggregation Build Strategies
To learn more about pre-aggregation build strategies, head
here.
| Feature | Works with read-only mode? | Is default? |
|---|---|---|
| Batching | ❌ | ✅ |
| Export Bucket | - | - |