Data modeling with YAML, Jinja, and Python

Cube supports authoring dynamic data models using the Jinja templating language and Python. This allows de-duplicating common patterns in your data models as well as dynamically generating data models from a remote data source. Jinja is supported in all YAML data model files.

YAML

It is recommended to default to YAML syntax because of its simplicity and readability.

Folded and literal strings

Sometimes you might want to use multi-line strings in YAML-based data models, e.g., in parameters such as sql or description. It is recommended to use literal (|) string style in such cases as it preserves line breaks.

cubes:
  - name: orders
    description: |
      This cube represents customer orders.
      It includes measures for total sales and order count.
    sql: |
      -- Fetch only relevant columns
      SELECT id, created_at, total_amount
      FROM staging.orders

Jinja

Please check the Jinja documentation for details on Jinja syntax.

Previewing YAML

You can preview the data model code after applying Jinja templates in the Data Model editor by clicking … → Jinja Preview on files that contain Jinja templates in the sidebar.

Currently, there’s no way to preview the data model code in YAML after applying Jinja templates in Cube Core. Please track this issue.

You can also view the resulting data model in Playground and Visual Model. Also, you can introspect the data model using the /v1/meta REST (JSON) API endpoint.

Loops

Jinja supports looping over lists and dictionaries. In the following example, we loop over a list of nested properties and generate a LEFT JOIN UNNEST clause for each one: for each one:

{%- set nested_properties = [
  "referrer",
  "href",
  "host",
  "pathname",
  "search"
] -%}

cubes:
  - name: analytics
    sql: |
      SELECT
      {%- for prop in nested_properties %}
        {{ prop }}_prop.value AS {{ prop }}
      {%- endfor %}
      FROM public.events
      {%- for prop in nested_properties %}
      LEFT JOIN UNNEST(properties) AS {{ prop }}_prop ON {{ prop }}_prop.key = '{{ prop }}'
      {%- endfor %}

Another useful pattern is to loop over a dictionary of values and generate a measure for each one, as in the following example:

{%- set metrics = {
  "mau": 30,
  "wau": 7,
  "day": 1
} %}

cubes:
  - name: orders
    sql_table: public.orders

    measures:
      {%- for name, days in metrics | items %}
      - name: {{ name | safe }}
        type: count_distinct
        sql: user_id
        rolling_window:
          trailing: {{ days }} day
          offset: start
      {% endfor %}

Macros

Cube data models also support Jinja macros, which allow you to define reusable snippets of code. You can read more about macros in the Jinja documentation. In the following example, we define a macro called dimension() which generates a dimension definition in Cube. This macro is then invoked multiple times to generate multiple dimensions:

{# Declare the macro before using it, otherwise Jinja will throw an error. #}
{%- macro dimension(column_name, type='string', primary_key=False) -%}
      - name: {{ column_name }}
        sql: {{ column_name }}
        type: {{ type }}
        {% if primary_key -%}
        primary_key: true
        {% endif -%}
{% endmacro -%}

cubes:
  - name: orders
    sql_table: public.orders

    dimensions:
      {{ dimension('id', 'number', primary_key=True) }}
      {{ dimension('status') }}
      {{ dimension('created_at', 'time') }}
      {{ dimension('completed_at', 'time') }}

You could also use macros to generate SQL snippets for use in the sql property:

{%- macro cents_to_dollars(column_name, precision=2) -%}
  ({{ column_name }} / 100)::NUMERIC(16, {{ precision }})
{%- endmacro -%}

cubes:
  - name: payments
    sql: |
      SELECT
        id AS payment_id,
        {{ cents_to_dollars('amount') }} AS amount_usd
      FROM app_data.payments

Reusing macros across files

You can define macros in dedicated .jinja files and import them into your data model files using Jinja’s import statement. This is useful for sharing common patterns across multiple cubes and views. Consider the following project structure:

.
└── cube/
    ├── model/
    │   ├── cubes/
    │   │   └── orders.yml
    │   ├── views/
    │   └── macros/
    │       └── common_dimensions.jinja
    └── cube.py

First, define reusable macros in a .jinja file under the macros/ directory:

{%- macro dimension(column_name, type='string', primary_key=False) -%}
      - name: {{ column_name }}
        sql: {{ column_name }}
        type: {{ type }}
        {% if primary_key -%}
        primary_key: true
        {% endif -%}
{% endmacro -%}

{%- macro cents_to_dollars(column_name, precision=2) -%}
  ({{ column_name }} / 100)::NUMERIC(16, {{ precision }})
{%- endmacro -%}

Then, import and use those macros in your data model files:

{%- import "macros/common_dimensions.jinja" as common -%}

cubes:
  - name: orders
    sql_table: public.orders

    dimensions:
      {{ common.dimension('id', 'number', primary_key=True) }}
      {{ common.dimension('status') }}
      {{ common.dimension('created_at', 'time') }}

    measures:
      - name: amount_usd
        type: sum
        sql: "{{ common.cents_to_dollars('amount') }}"

The import path is relative to the model/ directory.

Escaping unsafe strings

Auto-escaping of unsafe string values in Jinja templates is enabled by default. It means that any strings coming from Python might get wrapped in quotes, potentially breaking YAML syntax. You can work around that by using the safe Jinja filter with such string values:

cubes:
  - name: my_cube
    description: {{ get_unsafe_string() | safe }}

Alternatively, you can wrap unsafe strings into instances of the following class in your Python code, effectively marking them as safe. This is particularly useful for library code, e.g., similar to the cube_dbt package.

class SafeString(str):
  is_safe: bool

  def __init__(self, v: str):
    self.is_safe = True

Python

Template context

You can use Python to declare functions that can be invoked and variables that can be referenced from within a Jinja template. These functions and variables must be defined in model/globals.py file and registered in the TemplateContext instance.

See the TemplateContext reference for more details.

In the following example, we declare a function called load_data that supposedly loads data from a remote API endpoint. We will then use the function to generate a data model:

from cube import TemplateContext
 
template = TemplateContext()

@template.function('load_data')
def load_data():
   client = MyApiClient("example.com")
   return client.load_data()


class MyApiClient:
  def __init__(self, api_url):
    self.api_url = api_url

  # mock API call
  def load_data(self):
    api_response = {
      "cubes": [
        {
          "name": "cube_from_api",
          "measures": [
            { "name": "count", "type": "count" },
            { "name": "total", "type": "sum", "sql": "amount" }
          ],
          "dimensions": []
        },
        {
          "name": "cube_from_api_with_dimensions",
          "measures": [
            { "name": "active_users", "type": "count_distinct", "sql": "user_id" }
          ],
          "dimensions": [
            { "name": "city", "sql": "city_column", "type": "string" }
          ]
        }
      ]
    }
    return api_response

Now that we’ve decorated our function with the @template.function decorator, we can call it from within a Jinja template. In the following example, we’ll call the load_data() function and use the result to generate a data model.

cubes:
  {# Here we use the decorated function from earlier #}
  {%- for cube in load_data()["cubes"] %}

  - name: {{ cube.name }}

  {%- if cube.measures is not none and cube.measures|length > 0 %}
    measures:
      {%- for measure in cube.measures %}
      - name: {{ measure.name }}
        type: {{ measure.type }}
      {%- if measure.sql %}
        sql: {{ measure.sql }}
      {%- endif %}
      {%- endfor %}
  {%- endif %}

  {%- if cube.dimensions is not none and cube.dimensions|length > 0 %}
    dimensions:
      {%- for dimension in cube.dimensions %}
      - name: {{ dimension.name }}
        type: {{ dimension.type }}
        sql: {{ dimension.sql }}
      {%- endfor %}
  {%- endif %}
  {%- endfor %}

Imports

In the model/globals.py file (or the cube.py configuration file), you can import modules from the current directory. In the following example, we import a function from the utils module and use it to populate a variable in the template context:

def answer_to_main_question() -> str:
  return "42"

from cube import TemplateContext
from utils import answer_to_main_question

template = TemplateContext()

answer = answer_to_main_question()
template.add_variable('answer', answer)

Dependencies

If you need to use dependencies in your dynamic data model (or your cube.py configuration file), you can list them in the requirements.txt file in the root directory of your Cube deployment. They will be automatically installed with pip on the startup.

cube package is available out of the box, it doesn’t need to be listed in requirements.txt.

If you use dbt for data transformation, you might find the cube_dbt package useful. It provides a set of utilities that simplify defining the data model in YAML based on dbt models. If you need to use dependencies with native extensions, build a custom Docker image.

​YAML

​Folded and literal strings

​Jinja

​Previewing YAML

​Loops

​Macros

​Reusing macros across files

​Escaping unsafe strings

​Python

​Template context

​Imports

​Dependencies

YAML

Folded and literal strings

Jinja

Previewing YAML

Loops

Macros

Reusing macros across files

Escaping unsafe strings

Python

Template context

Imports

Dependencies