The Current State of Semantic Layers

Data

The Current State of Semantic Layers

In my last blog post, where I introduced the concept of a universal semantic layer, we outlined the benefits that organizations with multiple business domains can recognize. These organizations use data and data consumption tools, and can adopt a universal semantic layer to gain a control tower view of their data interactions as well as provide a prescribed definition of domain relationships and measures into a flexible SQL interaction layer. Today, we will focus on three key players in this emerging technology category: dbt labs, Cube and Tableau Pulse. We’ll examine how they differ from one another in terms of modality, features, integrations and the major considerations you’ll need to make when choosing a semantic layer tool.

Consideration 1: Modes of Data Consumption

A semantic layer aims to deliver consistently defined metrics under central governance. Depending on the organization, these metrics may be served through a BI tool or exposed through an API for integration into other workflows. Some organizations may prefer both approaches.

Cube Cloud is a dedicated semantic layer platform offering numerous pre-built integrations and a robust API suite so you can deploy your data anywhere. These include enterprise and open-source BI tools, popular notebook tools like Jupyter, and spreadsheet tools like Excel. When integrating your BI platform with Cube, you can connect to your semantic model (a cube) via the SQL API. Alternatively, you can use the Semantic Layer sync, which automatically synchronizes your data model between the semantic layer and any connected BI tools.

dbt’s Semantic Layer is available to all dbt cloud customers. It also offers several pre-built BI integrations with tools like Tableau, Google Sheets and a GraphQL API. Semantic models in dbt use YAML and references similar to other dbt assets. In dbt, semantic models can define metrics, which will appear as separate nodes on your DAG.

Tableau Pulse offers a more straightforward approach to propagating metrics. It only permits sharing metrics and data with authenticated users in Tableau, or using its REST API to create or consume metrics elsewhere. Pulse also features a robust UI that generates natural language summaries and visualizations of the metrics configured in the metric builder.

Consideration 2: Fully Featured Platform vs. Platform Features

In choosing your semantic layer, you’ll want to consider what features you may need. Many semantic layer offerings are part of larger products like dbt or Tableau, making it easy to build your semantic layer on existing data artifacts. Cube offers a fully-featured platform, which includes a bevy of useful features for organizations working with embedded analytics.

Cube’s platform is great for teams that want to define their access controls, conduct data tests and have a dedicated caching layer on top of their semantic models. For access controls, Cube has built-in security functions that can be used to create bespoke embedded experiences, including column and row-level security. Cube also offers a robust sandbox for testing data in dev, test and production environments, which can be done using SQL or a visualization builder. For Cube customers who are using one of their many BI integrations, you will also have dedicated dev and production credentials. Lastly, Cube offers pre-aggregations and caching options that can be used to improve query performance against a semantic model.

Example Cube YAML file:

cubes:
  - name: users
    # ...

    pre_aggregations:
      - name: lambda
        type: rollup_lambda
        union_with_source_data: true
        rollups:
          - CUBE.batch

      - name: batch
        measures:
          - orders.count
        dimensions:
          - user.name
        time_dimension: users.created_at
        granularity: day
        partition_granularity: day
        build_range_start:
          sql: SELECT '2020-01-01'
        build_range_end:
          sql: SELECT '2022-05-30'

In dbt, the Semantic Layer comes with all dbt Cloud accounts, and you need a dbt project set up to use it. Users can build semantic models that include metrics, which are explicitly defined using YAML files like any other dbt asset, and add it to your dbt lineage. Aside from the YAML declaration of semantic models, dbt offers APIs so developers can develop custom connections to dbt semantic models, and pre-built connectors to Tableau Cloud, Mode and Google Sheets. Unlike Cube, dbt uses your data platform for results caching.

Example of dbt syntax:

semantic_model:
  name: orders_monthly
  description: orders_monthly #Metadata that will appear in dbt docs
  defaults:
    agg_time_dimension: ds   #Date Field
  model: ref('orders_monthly_source') #Reference to an upstream model in our dbt project
  measures:
    - name: orders_monthly_source #Column Name that Users will see
      agg: sum                         #Default Aggregation
      create_metric: true
  primary_entity: order_id #Primary key

Tableau Pulse, on the other hand, is a feature of the overall Tableau ecosystem. In the UI, users can build metrics on published data sources in their Tableau cloud. For other tools to leverage metrics created in Pulse, developers must build a custom integration using the Pulse REST API.

Consideration 3: Performance and Cost Management

For data organizations looking to adopt semantic layers, it’s imperative to understand what levers are available to manage cost and the direct drivers of cost within your data platform. Fortunately, dbt and Cube both have built-in optimization techniques within their semantic layers.

In dbt’s semantic layer, developers can configure saved queries in their YAML files. Saved queries enable you to organize and reuse common queries against your data source and cache those common queries, so you can avoid incurring costs for users running the same query. It’s important to note that costs associated with semantic model queries will be allocated to your data platform, not dbt.

Here’s an example YAML block of how one would configure this in a dbt semantic model:

saved_queries:
  - name: sales_by_customer
    description: "{{ doc('This is a sum of sales for each customer') }}"
    label: Sales by Customer
    query_params:
      metrics:
        - sales_usd
      group_by:
        - "Dimension('customer_id')"
      where:
        - "{{ Dimension('ds', 'DAY') }} <= now()"
        - "{{ Dimension('ds', 'DAY') }} >= '2022-01-01'"

This is particularly useful for teams who are publishing dbt metrics into more static dashboards or applications. When it comes to managing the cost of the queries running off a dbt semantic model, teams should follow query tagging best practices to allocate cost within their data platform.

In Cube, developers have an entire control panel dedicated to performance tuning. Pre-aggregations in Cube work similarly to common queries in dbt, only users can explicitly define the cadence at which the query to build their pre-aggregation is built and determine which caching method is used depending on the complexity or criticality of the query. Additionally, pre-aggregations allow queries to reroute off a cloud data warehouse, which can decrease compute spend in your platform.

Cube also offers a Query History view which gives you insight into query performance, repeating queries and queries that use pre-aggregation. To optimize cost and performance in Cube, teams have the option to declare caching configuration at the query level using Cube Store, Cube Store with a suboptimal query plan, In-memory or data source pre-aggregation. To monitor your performance and allocation of work across Cube’s cache types, teams can use the Performance Insights panels.

Conclusion: Which Semantic Layer is Right for You?

As demonstrated, there is a rich set of features that ships with each one of these solutions. Ultimately, selecting a semantic layer comes down to what you want to optimize for in your organization. If you’re an organization that needs to consume a semantic layer through two or more BI platforms and an API, or want built-in performance-tuning and observability, Cube could provide the feature set to support that. For teams who are already working in dbt and are looking to implement cost-management best practices in their data warehouse, dbt’s semantic layer may allow your team to develop your semantic layer on top of a mature dbt project quickly. For teams that want to operationalize their existing data sources in Tableau Cloud, Pulse will enable your team to create metrics that can make visualization and consumption of metrics simple.

Have questions about which approach is best for you? Reach out to us and we’d love to help walk you through that decision! You can also get started with dbt cloud and Cube for free.

KeepWatch by InterWorks

Whether you need support for one platform or many, our technical experts have you covered.

More About the Author

Jack Hulbert

Analytics Architect
The Current State of Semantic Layers In my last blog post, where I introduced the concept of a universal semantic layer, we outlined the benefits that organizations with ...
Introduction to Semantic Layers At InterWorks, most consultants would be able to tell you this story: “My client has asked me to help build a source of truth for ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!