Setting up Your dbt Environment

Data

Setting up Your dbt Environment

and

In our last post, we talked about the differences between dbt Cloud and dbt Core. Once you’ve decided which environment works best for your team, it’s time to start development. In this post, we’ll walk through some of the basic setup of dbt and briefly describe why each feature is important.

Projects

dbt projects are simply a collection of files and directories. The types of files you’ll see are .sql and .yml files. The .sql files contain SQL SELECT statements. The .yml files contain the configurations to run your project. These are the basic building blocks for dbt. The root project directory will look like the following:

README.md
analysis
data
dbt_project.yml
macros
models
snapshots
tests

We will cover more advanced features in future articles. For now, just know that this project directory will house all of the code and configuration to run your project.

One note on the above. There is, by default, one file not included in this directory. It is the profiles.yml file. This configuration file is where you would put your connection information to your SQL engine. Think accounts, usernames, passwords, etc. Because of the sensitivity of that information, it’s wise not to store it directly in your project directory that gets checked into a version control system like GitHub or GitLab.

However, because dbt uses templating via Jinja, you can configure your profiles.yml file to use environment variables instead of hard-coding your credentials. This allows for packing all of your dbt assets into one directory and removes the potential exposure of sensitive information when checking the project into your version control system. With that in mind, let’s take a look at two of the directories that will cover almost all of your dbt projects, models and tests.

Models

In your project, the models directory is where you will store the SQL code you’ve written to transform your data. Generally speaking, these will exist as tables and/or views. In dbt, these are called materializations. Being SQL, the code you author will live in .sql files. But SQL is not the only thing that you’ll put in your models directory. Typically, you will also have at least one .yml file that will define your source data. Sources are a convenient way to reference the starting point for your data transformations. By defining your sources in a .yml file, you can keep your code clean and avoid duplication by referencing the source in your .sql code.

Tests

The tests directory is very similar to the models directory in that it will store SQL code you’ve written. In dbt, these types of tests are used to ensure that the data in your data models conforms to an expected output. However, the SQL is a bit counterintuitive at first – reason being, the SQL should attempt to show results where the test is not true. For example, suppose you know that your payment system only accepts the following forms of payment in a transaction: cash, check or credit. The test or SQL query might look like the following:

SELECT *
FROM transactions
WHERE payment_type NOT IN ('cash','check','credit')

If zero records return, the test passes. If something else, the test fails. But that’s not the only way you can test your data. You can also define tests in a .yml file in your models directory. dbt also has a few generalized tests pre-packaged. They are unique, not_nullaccepted_values and relationships.

Documentation

Before we wrap up, we’ll look at one last thing that is not created by default but will be essential to the operation of any dbt project: docs.

For dbt to create the project documentation, you’ll run the following:

dbt docs generate

This will create three important files in the target directory: index.htmlcatalog.json and manifest.json. These files contain all the information needed to show the project documentation in a webpage. In fact, you can serve the documentation straight from your laptop by running the following:

dbt docs serve

What’s Next

As we’ve seen, models, tests and documentation are the foundation of any dbt project. And while foundational, that’s not the end of the story. Advanced features like Jinja templating, macros and packages are how you can take your dbt projects even further. As always, if you need help with any of the features of dbt or simply want to discuss how dbt fits in with your modern data stack, please reach out. We’d love to help.

More About the Author

Derek Hecksher

Data Lead
Setting up Your dbt Environment In our last post, we talked about the differences between dbt Cloud and dbt Core. Once you’ve decided which environment works best for ...
Exploring dbt Cloud vs. Core In our previous post, we briefly introduced dbt and an emerging role on modern data teams called an analytics engineer. If you’ve had a ...

See more from this author →

Subscribe to our newsletter

  • I understand that InterWorks will use the data provided for the purpose of communication and the administration my request. InterWorks will never disclose or sell any personal data except where required to do so by law. Finally, I understand that future communications related topics and events may be sent from InterWorks, but I can opt-out at any time.
  • This field is for validation purposes and should be left unchanged.

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072