Getting to Know dbt

Data

Getting to Know dbt

On a recent project, I heard a colleague recommend dbt and it sounded amazing. Every feature of this product seemed to address challenges I run into on almost every data project. I decided it was time to take a look.

Unravelling the Spaghetti

As a consultant, I often get introduced to new data sets. If you’ve recently joined a data team or had a new member join you, then you’ll recognize this experience too.

Briefing sessions are often a roller coaster ride where years of organic development and the complexities of business logic get compressed into a few minutes while you do your darndest to take in as much as you can.

To understand what is happening, I’ll generally spend time drawing out how data is flowing from one place to the next along its journey.

dbt encourages and helps build this documentation as you go. The starting point is an interactive lineage diagram that lets you see at a glance how the data is flowing from one place to the next. In the example below, the fct_orders view is selected, and the arrows show upstream and downstream dependencies.

Above: Lineage diagram in dbt.

Working DRY

Analysts are often under pressure to solve the problem of the day and get results fast. This can lead to complexity: for example multiple transformations taking place in a single view and the same solution being built in multiple places.

dbt encourages DRY (Don’t Repeat Yourself) design. This is far easier to do if you have a good overview of your data flow and can build individual steps that can be re-used.

This is supported by a system of flexible references that link individual steps together. For more complex tasks, dbt provides pre-built packages that can be integrated into your project.

Building and Collaborating Safely

Before discovering dbt, it had never occurred to me that SQL can be treated exactly like any other code – for example, Python. This opens up the option to use powerful and proven revision management tools. In dbt, the SQL is housed in a git repository and the cloud interface guides you along the way.

Now I can work with colleagues or develop new functionality knowing that my production version is always protected and available.

I no longer need to think of saving SQL files with version numbers as I work and collaboration becomes a workable option for me and other developers on the team.

Conclusion

This is just an introduction to dbt – the tool can do far more than I show here. It is a tool for analysts that are comfortable working in SQL. While dbt manages the SQL code, the actual execution is done in the database. This means analysts can work with the SQL they already know and can depend on the power of the database to deliver results fast.

It is easy to try this out for yourself. Individual developers can set up a free account that gives unlimited room to experiment for as long as you wish. There are some great tutorials and datasets available to get started too.

If you would like to see dbt in action feel free to reach out to me. I’m a fan and always happy to show you why!

More About the Author

Paul Vincent

Membership Lead, EMEA
Leveraging Published Data Sources in Tableau Pulse Now that you’ve decided to test drive Pulse, what do you need to get ready? In this blog post, we’ll take a closer look at the data ...
What Is Tableau Pulse and What Can It Do for Me? Modern analytics has transformed the way we interact with data today. We’ve moved beyond spreadsheets to bring clarity and insight in ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!