After you’ve finished setting up your dbt environment and begun working in dbt, you may find that you’ll need more than just the basics. As mentioned in The Transformational Power of dbt in Analytics, dbt allows data teams to leverage the capabilities of software engineering teams. Here, we are going to talk about dbt’s more advanced capabilities and how you can implement them in your projects.
Packages
Packages in dbt are standalone dbt projects that can be incorporated into other dbt projects. Packages are comparable to external libraries available in other programming languages. Think of things like Pandas in Python or dplyr in R. This helps make your data models more composable in the way that a Python project can be composed of potentially many libraries. dbt packages can then decrease the amount of time required to build data models by providing code that someone else has already created for common data-modeling scenarios. dbt makes all packages available via their package hub. There you can explore packages to solve common problems, like modeling data from Salesforce.
Jinja and Macros
One problem with SQL in the context of a modern data team is that SQL is almost entirely static. dbt solves for this by incorporating the Jinja templating language and the use of macros. Jinja provides a way to use control structures like “if statements” and for loops. In dbt, you use Jinja to write macros. Macros are similar to functions in other programming languages. They are reusable pieces of code that can be incorporated into your SQL models.
For example, below is a simple way to calculate sales tax using a macro:
{% macro sales_tax(column_name, sales_tax_rate, precision=2) %} ({{ column_name }} * (sales_tax_rate / 100))::numeric(16, {{ precision }}) {% endmacro %}
To use this macro inside a model, the following query could be an example:
select product, sale_price, sales_tax(sale_price, 8.75) as total_sales_tax from product_sales;
Using the above macro, we can now calculate the amount of sales tax owed per product if the sales tax rate is 8.75%.
Hooks and Operations
Hooks are SQL statements that can be executed before or after a model, seed or snapshot is run. When you add hooks to transformations, you have the ability to create dependencies before a model is built. A great example of a pre-hook is before any model is built, you can choose the role prior to running your script. Meanwhile, a post-hook could grant the ability to select values in a newly created table or view to a role after the model is built. Operations are simply a macro that can be run outside of a model. The dbt docs give an example of running an operation and show how to easily use operations inside of dbt.
Implementing dbt
Advanced dbt usage is typically a function of project maturity. As you create more models, add more sources and run more tests, the ability to modularize and reuse parts of your code increases the dependability, consistency and trustworthiness of your data. It’s past time data teams take full advantage of the tools that software teams have had for years.
We have been a part of numerous dbt implementations across a variety of industries; with this depth of experience, we’re knowledgeable of the pitfalls to avoid. If you are having trouble finding the right package, creating macros or fitting dbt within your team, don’t hesitate to reach out. We’d love to help!