In our last post, we talked about the differences between dbt Cloud and dbt Core. Once you’ve decided which environment works best for your team, it’s time to start development. In this post, we’ll walk through some of the basic setup of dbt and briefly describe why each feature is important.
dbt projects are simply a collection of files and directories. The types of files you’ll see are .sql and .yml files. The .sql files contain SQL SELECT statements. The .yml files contain the configurations to run your project. These are the basic building blocks for dbt. The root project directory will look like the following:
README.md analysis data dbt_project.yml macros models snapshots tests
We will cover more advanced features in future articles. For now, just know that this project directory will house all of the code and configuration to run your project.
One note on the above. There is, by default, one file not included in this directory. It is the profiles.yml file. This configuration file is where you would put your connection information to your SQL engine. Think accounts, usernames, passwords, etc. Because of the sensitivity of that information, it’s wise not to store it directly in your project directory that gets checked into a version control system like GitHub or GitLab.
However, because dbt uses templating via Jinja, you can configure your profiles.yml file to use environment variables instead of hard-coding your credentials. This allows for packing all of your dbt assets into one directory and removes the potential exposure of sensitive information when checking the project into your version control system. With that in mind, let’s take a look at two of the directories that will cover almost all of your dbt projects, models and tests.
In your project, the models directory is where you will store the SQL code you’ve written to transform your data. Generally speaking, these will exist as tables and/or views. In dbt, these are called materializations. Being SQL, the code you author will live in .sql files. But SQL is not the only thing that you’ll put in your models directory. Typically, you will also have at least one .yml file that will define your source data. Sources are a convenient way to reference the starting point for your data transformations. By defining your sources in a .yml file, you can keep your code clean and avoid duplication by referencing the source in your .sql code.
The tests directory is very similar to the models directory in that it will store SQL code you’ve written. In dbt, these types of tests are used to ensure that the data in your data models conforms to an expected output. However, the SQL is a bit counterintuitive at first – reason being, the SQL should attempt to show results where the test is not true. For example, suppose you know that your payment system only accepts the following forms of payment in a transaction: cash, check or credit. The test or SQL query might look like the following:
SELECT * FROM transactions WHERE payment_type NOT IN ('cash','check','credit')
If zero records return, the test passes. If something else, the test fails. But that’s not the only way you can test your data. You can also define tests in a .yml file in your models directory. dbt also has a few generalized tests pre-packaged. They are unique, not_null, accepted_values and relationships.
Before we wrap up, we’ll look at one last thing that is not created by default but will be essential to the operation of any dbt project: docs.
For dbt to create the project documentation, you’ll run the following:
dbt docs generate
This will create three important files in the target directory: index.html, catalog.json and manifest.json. These files contain all the information needed to show the project documentation in a webpage. In fact, you can serve the documentation straight from your laptop by running the following:
dbt docs serve
As we’ve seen, models, tests and documentation are the foundation of any dbt project. And while foundational, that’s not the end of the story. Advanced features like Jinja templating, macros and packages are how you can take your dbt projects even further. As always, if you need help with any of the features of dbt or simply want to discuss how dbt fits in with your modern data stack, please reach out. We’d love to help.