Databricks
Databricks is a unified data intelligence platform.Prerequisites
- A JDK installation
- The JDBC URL for the Databricks cluster
Setup
Environment Variables
Add the following to a.env file in your Cube project:
Docker
Create a.env file as above, then extend the
cubejs/cube:jdk Docker image tag to build a Cube image with the JDBC driver:
Environment Variables
| Environment Variable | Description | Possible Values | Required |
|---|---|---|---|
CUBEJS_DB_NAME | The name of the database to connect to | A valid database name | ✅ |
CUBEJS_DB_DATABRICKS_URL | The URL for a JDBC connection | A valid JDBC URL | ✅ |
CUBEJS_DB_DATABRICKS_ACCEPT_POLICY | Whether or not to accept the license terms for the Databricks JDBC driver | true, false | ✅ |
CUBEJS_DB_DATABRICKS_OAUTH_CLIENT_ID | The OAuth client ID for service principal authentication | A valid client ID | ❌ |
CUBEJS_DB_DATABRICKS_OAUTH_CLIENT_SECRET | The OAuth client secret for service principal authentication | A valid client secret | ❌ |
CUBEJS_DB_DATABRICKS_TOKEN | The personal access token used to authenticate the Databricks connection | A valid token | ❌ |
CUBEJS_DB_DATABRICKS_CATALOG | The name of the Databricks catalog to connect to | A valid catalog name | ❌ |
CUBEJS_DB_EXPORT_BUCKET_MOUNT_DIR | The path for the Databricks DBFS mount (Not needed if using Unity Catalog connection) | A valid mount path | ❌ |
CUBEJS_DB_MAX_POOL | The maximum number of concurrent database connections to pool. Default is 8 | A valid number | ❌ |
CUBEJS_CONCURRENCY | The number of concurrent queries to the data source | A valid number | ❌ |
Pre-Aggregation Feature Support
count_distinct_approx
Measures of typecount_distinct_approx can
be used in pre-aggregations when using Databricks as a source database. To learn
more about Databricks’s support for approximate aggregate functions, click
here.
Pre-Aggregation Build Strategies
To learn more about pre-aggregation build strategies, head
here.
| Feature | Works with read-only mode? | Is default? |
|---|---|---|
| Simple | ✅ | ✅ |
| Export Bucket | ❌ | ❌ |
Simple
No extra configuration is required to configure simple pre-aggregation builds for Databricks.Export Bucket
Databricks supports using both AWS S3 and Azure Blob Storage for export bucket functionality.AWS S3
To use AWS S3 as an export bucket, first complete the Databricks guide on connecting to cloud object storage using Unity Catalog.Ensure the AWS credentials are correctly configured in IAM to allow reads and
writes to the export bucket in S3.
Google Cloud Storage
When using an export bucket, remember to assign the Storage Object Admin
role to your Google Cloud credentials (
CUBEJS_DB_EXPORT_GCS_CREDENTIALS).