dbt Cloud Essentials

Alright. We are now hitting our time. So I think we're gonna just hop right on into it. So thank you all so much for joining We are so happy to have you guys here for our DBT essentials webinar. So this is for who is this for? Right? This is for, obviously, you guys. Because you're here. The analyst or dashboard developer in kind of a self-service environment, if you're used to using Tableau or power BI or anything else, some sort of like dashboard development, then this is probably for you to help get your data in the way that you want it to. And again, similar. You build dashboards and other analytics tools. Over the course over the next forty to forty five minutes or so, we will cover an overview of DBT, what it is, what it's used for. And, again, a little bit more about who is using it We'll talk about some key features within DBT, and then we'll actually try it out live and start looking at a DBT project taking some SQL query from Tableau, some custom SQL query from Tableau, putting it into a DVD project, and then kinda utilizing within there. As you guys have questions, please, please, please feel free to either put it into the webinar chat or put it into the Q and A. We have people monitoring it. We also be monitoring it, throughout the time. So feel free to ask questions or say anything that you want about DBT or the presentation. I'll give a little bit of intro into InterWorks. Right. So we're talking a little bit more about, like, who we are. I think you all if you're here, you probably know of us a fair amount. We are BI and IT consulting firm based out in Oklahoma. I am not in Oklahoma. I am based in Raleigh, North Carolina. I enjoyed by my colleagues Jack, who is based in San Francisco, California. So we will be helping you guys go throughout this today. A little bit more about what you should know about us. We focus on people more than anything. We are very passionate people. We are very passionate about data, where it lives, what it looks like, what kind of insights you can get out of it. We have a pretty popular blog. I've written a few things. Jack's written a few things, and we partner with people like DBT. Surprise. Tableau, Snowflake, ThoughtSpot, etcetera. So that's a little bit about us as interworks, I'll give you a quick little intro to myself, and then Jack will give you an intro to him himself. I, again, my name is Rachel Kurtz. I'm based in Raleigh, North Carolina. I've been with interworks almost seven years now. It'll be seven years in July. And I have a background in data science myself, but I work with everything, the entire life cycle of data from data preparation, data prediction, and then data visualization as well. That's a little bit about me, and then I'll let Jack and, I think Collins's on the call right now too, so I'll let him introduce himself as well. So, Jack? Yeah. Hey. I'm Jack Hulbert. You may have seen me on other webinars. I'm based in San Francisco. I'm an analytics architect here at Interworks. Been here for about three years now. My background, my career history. I'm you call a little bit of a Swiss army knife of analytics. So, you know, I've done full run the full gamut from SQL engineering to data preparation for last mile datasets and a lot of work in the BI space across all of our esteemed partner tools. But mostly skewed towards Tableau and as of recently, DVD and Snowflake. So I've been really excited to see how all these tools can be used together, and, you know, start starting to see, like, a sort of convergence and how they can be used together to create really good, you know, business data models, that can be used not only in Tableau, but other tools or, technologies downstream. So, yeah, really excited. Thanks for joining us today. Hopcorn Colin. Yeah. Colin is our other colleague. Thanks. Yeah. My name's Colin Murray. I'm a data lead here for and orgs for the US, primarily work with DBT, Snowflake, and other ELC platforms to help migrate data to snowflakes to enabling, Rachel and Jack and Brooks and, yourselves in the call to perform those LLX workloads that your business requires from the pass it over to us. Yeah. Thanks, Colin, and apologies. I'll save you all from the chaos that is in the background I'm gonna stay off camera, but, my name is Brooks Barth, and I'm an analytics lead here at InterWorks, similar functions to the other three. Here where, I'm helping to drive innovation here within the organization, specifically within those analytics and, analytics engineering type spaces that Rachel and Jack will talk about with regards to DBT Snowflake and then those connections up through are are various BI platforms like Tableau and thought spot. And I'll kick it back to Rachel. So that's a little bit about us. I know that was a lot of, information about interworks and us. So now we're going to kind of flip it onto you guys and we have a couple of questions for you all just so we can kind of keep you engaged, keep you going with this, and understanding a little bit more about, like, where you all are at. So a poll should have just popped up in front of you where I would like you guys to answer two questions about how much time do you spend preparing your data And how would you describe your level of expertise with data preparation? So we're talking about anything from, like, using Excel to clean up the data before you do visualizations, writing sequel code, using a different tool, to do that. So just trying to get some information from you all about, like, where your feelings are with data preparation. I when I was going to school for a data science, there was always a joke and Iowa thought it was a joke. And then as I started actually doing the work of, you know, even as a data scientist, eighty percent of your time is spent cleaning data, not actually doing the fun part of the model building, so I definitely feel like that is very, very true. That's, yeah, that's like a top tier LinkedIn meme, that I I I see pretty frequently. It does it does happen. It was one of those where I thought, like, no. No way it's actually, like, eighty percent, and then realized you're I was right that it wasn't eighty percent. It's, like, ninety percent. So, you know, they they underestimated it. Alright. It seems like we could probably end this poll now. Got a nice little poisson distribution going. Oh, nicely done their check. So, yeah, we have, a pretty even distribution when it comes to how often you spend time preparing your data for your dashboarding projects or analytics projects. So I am envious of you who have less than twenty five percent your data engineering team is amazing. I also am seeing that when we're talking about our level of expertise, Again, we've got like a fairly even distribution. We've got some advanced people in the group, which I will always enjoy. And, also makes that the sense, I think, for where we are with DBT. So thank you. There will be spoiler, there will be two more of these as we go through. Just trying to keep you guys coming along with us as we go through our presentation. So, DBT. We're talking about DBT and what even is it? But first, I keep, like, teasing you guys with this. But first, we're gonna talk a little bit more about, like, who you all are, and we're talking about, like, that data preparation. And the way it is happening a lot of times when we're talking about the role of the analyst in a self-service environment, which a lot of us are in at this point. And I think a lot of companies just kind of inherently are is you end up having to wear a lot of hats when it comes to your day to day. Right? You're dealing with, dashboard. You're dealing with data analysis, but then you're also dealing with data quality, data preparation, where you're having to look at all of these and divide up your time based off of all of this. So what DVT is attempting to do is try to help kind of make some of this a little bit more efficient. So you're not spending so so much time cleaning and preparing your data. You can kind of automate it and make it a little bit easier to go. So try to help combine some of these hats. So maybe you're wearing three hats instead of seven. So a little bit more about what we were talking about with that is DBT and you, like, looking at the way DBT views roles within here, we've got the data engineer on the left hand side, Right? Those are the people they're building custom data integrations. They're managing overall pipeline orchestration, dealing with, like, the data warehouse formist optimizations. Right? They're they're doing more enterprise level really big things, putting stuff into snowflake, utilizing Natillion, etcetera. And then we have the data analyst, which I believe some of you all are. I am myself. Right, we're doing a lot of the, like, insight work why did something happen? What are the best channels to utilize? We're working with business users to understand what we need the data to look like anyway, what data are we interested in, and then we can talk to our data engineers to help get us that data. We're building the dashboards, again, understanding data requirements. The way and I believe DVT came up with this term, and now it's just kind of like everywhere, is there's actually kind of an in between, which is probably what we actually all are at this point, and that is the analytics engineer. So it's kind of a combination of the data engineer and the data analyst. So what did analytics engineer, right, they provide clean transform data ready for analysis. So they take that raw data that data engineers I've created. They do some transformation. Some last mile data prep is another phrase that you may have heard, from us probably and then publishing it to Tableau so that you can use it within your dashboards using software engineering best practices which is why it's kind of that mixture of data engineers, maintaining documentation and definitions, and then training business users on how to use it. So The goal is and what DBT believes is that a lot of us are not necessarily just a data engineer or just a data analyst. We're kind of somewhere in between and we're an analytics engineer. So If you see that phrase, that's something that DBT definitely uses a lot, and we're starting to see really take hold in other places as well. Told you there would be another one. So what is another poll for you all? What is your biggest challenge when it comes to data preparation? Is it cleaning messy data merging multiple tables together? Because it's never just in one place, dealing with large datasets, creating repeatable processes, or there probably should have been an other. I should have given you an option to say, or, all the above. I think is really what it should have been, probably. If you do have another option, again, feel free to put that into the chat and, kind of tell us what issues you're seeing with your data. So I think it's probably yeah. I think the the biggest option would be the all the above. Yeah. One thing back to your last slide, Rachel, one thing I really like about the analytics engineer term is that it puts the buckets that, you know, technical domain skills of, you know, what might be a data engineer and a data analyst with the business context and business domain knowledge, you know, often, I mean, I, at least from personal experience as someone who has been a down and consumer of data models. It's oftentimes not clear, like, how data objects ought to be used together, just, you know, to the layperson. It requires, you know, a lot like, a lot of diagrams and onboarding to just understand, like, how the data has to be, like, put together to get the metrics and the entities and the queries that we need to serve our users and our, builder applications. So having that into one team is is is quite great. That's a very good point. Thank you. And I think even showcasing the fact that of the four of us that are on this call, the the four interworks that are on this call, we've got a few from the BI side, so kind of more on the business intelligence data analyst side. And then we've got Colin who's from the data engineer, like, the data team side. And the fact that we're, like, all here together, we're kind of melding that together into this analytics engineer team. That we were talking about. So, yeah, it's bringing both like the technical and the understanding of kind of like the bigger picture stuff as well together. I I think we're going to end this poll, and it seems like a lot of people, the two biggest issues that we're seeing merging multiple tables together a hundred percent, especially if you're trying to do something like that within Tableau, right, joins can sometimes take a little bit of time, and then creating repeatable processes. Right? That's a big one that we see where it's just like, I may have a calculation, within my dashboard or my process, And then if Jack needs to do the same calculation, but he decides to change a number to, like, multiply by two instead of three, right, having that communication and making it repeatable across us and consist and across us gets a little bit problematic. So thank you for, again, you wanna ring me and answering another question. So now we can actually get into DBT. I'm done teasing you. We'll actually talk about what DBT is. DBT stands for the data build tool. And you'll notice that it is lowercase always. Just always put it in lowercase. Do not put it in a lowercase. So DVT is a data build tool. I'm going to read this real quick, and then we'll kinda talk a little bit about it. It enables data analysts and engineers, alright, both to prepare and confirm their data in their warehouses using familiar SQL syntax while providing features for testing, documentation, and version control. So that's really, like, what the big thing is about it is it's trying to allow everybody to come together in a single location using sequel that a lot of people are fairly familiar with at this point or in the process of learning while also allowing to test your data documents it and do version control with it. So, again, creating those repeatable processes that we were talking about or that a lot of you were having, some problems with when it comes to data prep. So again, another way to kinda put this, data build tool, It is very, very, very heavily. No. It just is SQL based. Right? So you can do all things that are available with SQL. If you're used to your custom SQL queries that are, you know, six hundred lines long, you can put that into something like DBT, create it as a repeat repeatable process and a lot add a lot more to it as well. So what is DVT? This is a nice little image that comes from DVT that I really feel talks to exactly where it should it likes to be within the the stack. Right? You have your raw data, whether that's in Snowflake or, a lot of different places. Right? You'll have your raw data on the left hand side. You bring it into this nice little DBT where you develop it, write your SQL code, do your tests and documentation and then deploy it to a bunch of different datasets, which then are available to people. Like, you can put it back into your snowflake where it can be talked to, by your Tableau or any other machine learning models that you're working on with that. So I was really I'm a very visual person, so this always sticks with me. It's also aesthetically pretty pleasing. So I always like to use this image when I'm talking about DBT. There when it's talking about DBT wave developing, and I'm talking about these kind of in a theoretical way, and then we're gonna go in live to it. And it's talking about there are three major aspects of DBT, which we've kind of talked to at this point, creating staging and final models using the sequel that you're to utilizing at this point, testing your raw and creative data, making sure things are not null when they should be null, making sure everything, like, primary key is unique. Right? And then documentation of the models as well as you're going through. So final question, and then we're actually gonna get into some DBT. What is your level of expertise with SQL? Are you brand new? Have you built a few queries maybe in Tableau or in, like, the little SQL course online that I like to do every once in a while? You use it with some frequency or you use it every day. Oh, I'm seeing a lot of you use it every day. Oh, like to pick all of your brains. Those of you who use it every day. I'm very, like, I'm in awe. I'm just hoping no one's going to dunk on me from the comments section. When I when I show my DBT project, Yeah. Be kind. Be kind. Your sequel's great, Jack. It'll be good. Alright. I learned everything from Colin. Yeah. So I think we can end this poll, and it does seem like, yeah. So none of you are brand new. So that's good. DVT will be really easy for you all to use. Looks like the majority of you, about fifty percent are using it every day. But everybody else is, like, still pretty pretty familiar with it. So that's good to know. And then, yeah, Thank you. So now I'm gonna stop sharing my screen and Jack is gonna take over. We're going to try it live using DBT. So we're gonna convert a Tableau workbook with a custom SQL data source into a DBT project. So I'm gonna stop sharing my screen and let you take it away, Jack. Hey there. Today, we are going to be walking through a an example DBT project. In today's example, we're going to convert a Tableau custom SQL query, within the Tableau dashboard into a DBT style project using DBT cloud. So to get started, just gonna set our goal for today. So we are going to take a dataset that is loaded into our cloud data warehouse, which is Snowflake, and we are going to, take an existing raw data source and show how custom sequel used in Tableau can be refactored and put into a DBT project to, get better documentation readability of SQL and organization within the data warehouse. We'll also cover the other added benefits of using DBT to build enterprise and, BI application ready datasets for all of your tableau or any business intelligence tool users downstream. So that's our goal for today. I am going to first start by showing us, you know, what we're working with here. So if you've been using for Tableau for any amount of time is more than likely that you have come across or yourself written yourself a custom SQL query. So in this particular case, in, you know, in today's example, we are working with a data model where we have data from an e commerce store that we just recently started. Now as a part of our, ecommerce store. We collect data from two sources. One is Shopify, which is where we get our customer information. And we also get our product information from Shopify. So this product table and our users table which is where our customers live are both coming from the Shopify source. And then our we have three other tables, page views, orders, and cart these three tables are derived from Google Analytics. So when a customer views a page on our website, that event gets logged in this page use table. If they add a product to their cart, whether they purchase it or not, it gets added to this table. And if they order that product, gets added to this table, and the revenue for that order gets captured in the database. So in this data set that where we've pulled into Tableau, we have a few things going on. We have a query written. As you can see in Tableau's native, SQL editor, it's a little bit difficult to read compared to other code editors. So I would, I'm actually going to bring this into another code editor so we can have better readability here. So I'm actually going to bring this into DBT Cloud. Okay. So This is the query that we're using to generate a Tableau ready dataset. A few things to, you know, This may seem like a boilerplate run of the mill query that you would see or execute against your database or create a view. But in this case, you know, as a consultant, oftentimes, we see teams using, this type of SQL logic all within a single script to create their data source. And, you know, whether it's in a published data source or you know, being used and tweaked multiple times for every workbook. You know, we've seen kind of seen it all. That being said, you know, there's a few key transformations in here that I think DBT can really help us, you know, sort of or we can organize our data warehouse better and organize our code into meaningful steps that is more readable and easier for us to track and test against. So given all the context, that I shared, we have a simple query where we're selecting a number of columns. We're selecting from four different tables. We have our products table, the page views table, So when people view site data, we have some segments of our customers, you know, what region they come from, if they're a repeat buyer, what country, where they were acquired from. So, you know, we have a simple select statement. Then we also have some case statements number one being, this one. So, you know, we're we are excluding sessions that after a certain date, so we have some logic that, you know, we want to filter out some of the sessions and some of the sources that are generating traffic, and then we wanna create a flag. So, you know, a simple case statement, or show we'll show you how we can refactor that in DVT and then some aggregations. So and then, of course, our joins and our group by. So we have, you know, rather sophomoric complexity. If you've seen, you know, written SQL for your own use case, you know, that this can, you know, these scripts can get quite long and if you have changes that you need to make, can be quite laborious to manage. And, again, DBT can help with that. So just to set this stage a little bit, you know, this is the query that we're working against and the tables. We'll come back to this later. But, voila, I launched this into our DBT project where I'm gonna show you how we can refactor and organize the the data sources and the code, that we can then bring into Tableau. And any other ups upstream applications So when you first create a d d DBT project, you'll be met with this interface, in with your when you're in development mode. Because DBT works with, you know, version control and, you know, you you can either set it up as a as it manages a GitHub repository for you or you can sync it with your own internal GitHub repo. In our case, we have DBT managing this for us. So simply, I'm just going to go to my development branch on this project. Dev. Okay. And once we're in here, this is my development branch. So this is where I've been working in the past. And, you know, you can either keep this standalone or, you know, you can always pull new feature branches. One of the major benefits of DBT is that Any number of developers or engineers can be working on a project at once, but it is very difficult to override your main. Branch, and it's integrated with GitHub. So, you know, it's pretty hard to push untested objects and code into production, which is one of the major benefits, compared to using, you know, custom sequel at the workbook level in Tableau. So once you get into your DBT project, everything is going to be built into this, you sort of navigation structure here. This is sort of your boilerplate DBT project. When you first create your project, you'll have some pre built folders such as models, macros, analyses, and then, a couple of key YAML files. So you'll have your, DBT project YAML start here. And really what this model is used for is to, you know, set all of your configurations for how the data in your, models is going to be materialized. You can define, you know, which which, databases and schemas you want to store your tables in at run time. So think of this as, you know, all of the configurations you could possibly make on your data, and at run time, you can put into this DBT project file. So in this case, you know, I kept it pretty simple. I kept, you know, sort of the defaults, you know, paths. I didn't override anything, but what I did do, and we'll go into this in a second is I set custom schemas for a couple of different folders, and you'll see why this is important, but, know, just a high level, you'll, you know, it's it's best to illustrate, like, how you're going to materialize your different data stages in here so they're, you don't get confused or, you know, you're not putting half processed data in the wrong place in your data warehouse. So DBT really, gives you an easy way of organizing that in one place, and, we'll also generate documentation that captures all this information. So enough about that. But, you know, this is where you would do all of your DBT configurations. Where we're gonna live today is the models folder. So, again, going back to you know, how this code here is organized. It's really just it's a few different transformations and cleaning steps lumped into one big script. Which is fine, but and then we also have, you know, some casting, as types and renaming aliasing. But with DBT, we can break those steps into meaningful chunks and, sort of layers in our data warehouse so that we have better paths of, you know, auditing and testing our data and capturing, you know, the lineage and, you know, source of error if they come up, but, yeah, just try to track the dependency from source to the final data set and everything happening in between, DBT can really help us out with that. But as I said earlier, we have two major, in DVD speak. There are two major steps. There's, you know, there's some sub steps, but the two major transformation or, logical areas for organizing your, your SQL logic is what DB call what DVT calls the marks layer and then the staging layer. So we're gonna start in the staging layer. Going back to our initial query, we are pulling from two discrete data sources. We have our, web analytics and order data coming from Google Analytics, we have our product and customer information coming from Shopify. So starting in our Google Linux folder. The staging layer is think of the staging layer as the layer of where in which you would do all of your data cleansing, aliasing, and, any, you know, casting, any any columns as different types or writing your case statements, or building any, custom logic that might be repeated and, you know, essentially, like, your data enrichment. So I'll show you what I mean here in a second. So looking at, you know, our staging table or staging view for, you know, our carts table which will eventually be our carts table. It's just a simple CTE, you know, written in SQL where We are just selecting all of the columns in the table, and then we are taking the case statement from our original script or the carts table, and we're applying that in this staging table. And this is beneficial because This means that, you know, when you're the user end user is in Tableau, or, you know, even in or anywhere downstream for that matter, they don't have to consider this logic. It's already built and tested into the code that you're writing. And we have a similar kind of output here for orders table, or case statement. So, again, you know, very simple sequel, and then you know, all what you'll also see in every DBT project folder is a YAML file for, ideally, every, source that you have. And all this really does is, you know, it defines which database you're working in, what schema you want to materialize these tables into, And it gives and it also gives you an opportunity to write out metadata about the columns and the, you know, the facts that you're bringing into your data model here. So in this case, looking at here, I bring in all of my tables, I define primary keys, and I also define what are called. I get some descriptions in terms of, like, the business context, and I define tests. This test block is really key and sort of, you know, imperative to understanding DBT and honestly, what what sets it apart from, you know, an environment like Tableau for executing SQL. For every column in DBT, you can execute tests against individual columns, and then you can use what are called snapshots to test the overall, quality of your data. But gonna go into snapshots today, but testing will sort of show you at a high level what testing allows you to do is conduct some evaluations of the values or, you know, quality of your data. And so at run time, or, you know, you can test before run time, but ideally, you know, you, for in this case, we have a unique identifier. We can run tests for uniqueness of that identifier. So, you know, making sure every ID in the table is unique. We can also run tests for knowles. So obviously in an ID column, we probably don't want knowles. So we can test against that. By defining these at the column level, we will that essentially will throw an error once it's run. And it won't, you know, it'll notify us of any failures and, you know, you can test all this before you push to production. So This is really beneficial. In the testing block, you can also define, your own custom test. So for instance, this is our column for the dollar amount. Say we only want to consider sales where it's greater than zero dollars. I know that sounds silly because, you know, most times sale definition of a sales above zero dollars, but let's just say, you know, we want to hard code that and make sure that is the case for everything. We can do that here. So, you know, you, essentially, these are things that in SQL speak, People might add to, you know, a case statement just to, like, fail proof, but this, it's actually built into the DNA of the code and the project. So and then it, you know, it it won't let the model run unless it passes all of these tests. So really beneficial testing and, you know, something we recommend running on all of your, you know, any primary keys, in your table. You'll also notice, for every YEML source This is once we define the source, we can actually reference this EML file anywhere upstream in our project. So when we define this as a source and save it, it'll create this node for us. So in this case, you know, for when we say there's raw page views, raw carts and raw orders. Since this is in the tables block of our source, that will create this note up here and kind of identify them as database objects that, can be used for tracking inch to upstream models. So pretty cool. And and especially if you have a lot of complex of steps in your data data integration or transformation process. You know, you can run you can use the d t command line to execute really specific nodes within your model, upstream, downstream. So, you know, you can get very specific with, how you execute your your models, even when it's really complex. So I would consider this of, very low in terms of complexity Okay. So we kinda covered sources and you know, SQL files and the staging layer. So, you know, for our Google Analytics data, which is our orders, carts and page views, essentially, we're executing basic CTEs that house all of our, you know, our cleaning logic If you're doing any major data transformations, you know, you can also create an additional layer for that, because, you know, their transformations can get quite complex if you're pivoting and or, you know, unnesting JSON, you know, so there might if you if you have a use case for that, you know, we recommend reading the DVT docs to get best practices on how to organize that. But in this case, you know, pretty simple staging We do a similar thing for Shopify data over here. Just simple CTEs, here. We cast what is currently, you know, a string of our car in our Shopify table into a bullion. So it's a little bit cleaner when you get upstream and, users aren't confused by null values or, non non numeric values in this case. And so in our that's our staging layer. So that's where we're going to do all of our cleansing. And then we get into this concept of the marks layer. So when we marks is a terminology, you know, often used in how, like, companies will organize our DBT projects. And really the focus, or the the purpose of having Mart's, you know, in your or in your organization here is it allows you to align the logic within your transformations and your final production datasets with a particular business domain that you're creating the the data assets for. So in this case, you know, rather than having a single query just to align with a particular workbook output for Tableau, what we have going in our March is we have actually a logical organization of the outputs into different functions or business domains. And what we can do here is, you know, we can then materialize these in different schemas in our warehouse and then, you know, any analysts or users of this data know that, yeah, these can be, they're like purpose built for that function or, you know, it just it just adds a little bit more clarity and organization to your data warehouse and the readability of you know, for people using Tableau or making new connections. So having this organization is pretty imperative, and it also really helps with, project documentation and overall just you know, making things easier to navigate for end users of the data. So in our March layer, you know, in this particular case, we're gonna have three what we call Marts. We have one for finance. So their Mart is going to have detailed information about orders and revenue generating information. So basically anything after an order is placed and revenue is recognized that would roll out under our finance domain. For marketing, they care about customer segmentation and, you know, everything related to customer information. So we have a specific data set for them. Website, this is gonna be all of the web analytics and traffic some high level metrics about what's going on on our website for our web team to review. And in building this, our marks, we're also making sure that because these are all derived from the same two sources and have, you know, a relationship with each other, they're all going to be usable and relatable to each other. So rather than using a single query in Tableau, we're gonna be able to actually relate these three datasets together and create a very, you know, useful set of, you know, business semantics that go past, you know, just a few columns. So we'll be able to enrich the data and they'll be able to pull this these domains and this information into Tableau as well, and we'll show that here in a sec. So in our orders table, one thing that's also beneficial about DBT is that you can configure your materialization types in the command line, or you can do it in the YAML, as I enter your DBT project file. So in this case, our our staging layer, I materialize all of our data objects as views. In this case, I want to materialize these, these final presentation layer datasets as tables because I anticipate a lot more concurrent usage compared to my staging tables. Most likely no one's going to be using those. So for, this one, I have it materializes a table, and you can set default sort settings and what column is it's distinct on. And then here, we refactor our code a little bit, compared to what we originally had. And so we really just we refactor it into, you know, CTEs that's sort of the best practice where you'll see a lot of DBT projects. And, you know, it's typically that's honestly how your SQL should be organized most of the time. But, you know, we see our reference blocks. So back to what I said earlier about, you know, being able to define a source in your YEML file that this is the syntax for referencing one of those objects. And what that allows you to do is that the these references allow you to build this lineage diagram that allows you to have the the node graph here, which I find very useful. And then down here, we just take our orders. So we for this particular data set, we have our orders, So all the revenue generating information, the products that we sell, and our customers who bought those products. So for a final production data set for finance, we are just taking things like order ID, product names, customer IDs, and customer information and then ultimately the how much money they generated. And then in DBT, we can execute that. Preview it. It's my first seat, query of the day. So it might take a second too to fire up. There we go. Okay. Cool. So now we have a presentation layer dataset and everything is typed correctly. We have things like order ID, dates, product name, customer ID. So this gives us a really useful dataset. We everything's clean. There's no nulls. Out of the box, DBT also will allow you to see the sequel that's compiled at run time. So in case you want to if you get any errors, and you wanna compare and see if that, you know, if you wanna do some testing, you can actually you know, take the compiled sequel versus the the previous sequel. And then once you have your sequel ready, you have a bunch of different options for how to run or test that model. So, you know, or you can do the execute any of these in the the command lines and get a little bit more specific. But in this case, if we wanted to build this model into our data warehouse, we could just say DBT build. And then it'll bring us this show us what is executing There you go. So here, once you run, you can see all these unique tests that are being run on our data. So we can see that, you know, these tables are being hit with the, not null test and the unique test and all successfully passing. And then these are also the tests that you can configure at the time when you're you know, when when you're pushing your project into production. And so in our original query, going on with the marketing, Mark, in our original query, we had a lot of aggregations going on. So, you know, if you're wanting to know where to organize and do your aggregations in DBT, it's typically best practice to do this in the March layer once you're you already have your data cleansed and, you know, had a correct granularity. So in this case, we're gonna create our customers table, where we're gonna enrich that with some historical orders data and then create a more enriched finals customers table. That'll give us far more information than what our original query had. So now with our, we're able to using CTEs refactor and, get some additional value from our data by getting, you know, customers fur first purchase based on order history their latest purchase, the number of historical purchases, and their lifetime value. So their historical spend with us over time not just on their last order. Then we can join that here and give us a customer data set. That has all of the segmentation information that you might wanna be able. If you're a marketer or Someone doing, you know, customer segmentation, you can get at that now. And you have a more purpose built data set here. And same goes for web traffic. We have we're doing we want to aggregate the data because session level data might be overkill, and we don't want to have, you know, pay page view ID level data. So for analytics, that might that would be quite laborious, or taxing on the performance of our workbook if we had data aggregate at that level of granularity. So we're gonna aggregate it up to the ID. So in here, we're, rolling up our users, page views, carts, and orders counts, to account distinct of those IDs. So we're doing our aggregations. And then here, we're also doing a sum of the value of carts that are abandoned or go through with the purchase, and then those that successfully ordered. So then here, we get aggregated data set that still retains all of the identifiers, that can be used to be related back to to our other, March. There we go. Okay. So, yeah, now we have visit date, product name, customer source, and then all these aggregated all these aggregations. We have this aggregate at the day level. So now when we look at our lineage, we're at the end here. So we have all. Well, this is just for web traffic alone, but we can also look at orders. So you can see the lineage for each of these models. Traffic. And then by going into your staging tables, you can see all of the downstream children. So this lineage the the lineage feature here is is very cool. And then lastly, so that that's that's the basic of basics of organizing your DBT projects. There's a number of other features, and, you know, you can add in custom macros. So if you want to, if you have recurring pieces of logic that you wanna write, and, you know, reference in your code. So it's things such as like case statements or, you know, custom Python scripts. Like, you can build macros to execute within at run time within your DBT projects. You can do analyses so you can set up, you know, little, like, workbooks that are not executed and shipped with your DBT projects. If you wanna, you know, do experimentation or, you know, drive new metrics or do ad hoc analysis, you can also do that within your DBT project. The analysis folder. And there's a bunch of other stuff, but we hope to this is, you know, sort of light overview. And then once we're ready to push our project to production, we can simply merge our branch to our main branch. And once that is merged, that gets made merged to our main branch. So when we execute our next run on, DBT cloud, then all the code that we just wrote and checked and tested will be executed. So then here, you can go in and set up a job, And then once the job, is defined, you can either have it triggered by API, you can do manual refreshes or manual manually run jobs, or you can, set it on a cron schedule. This one is run manually, but And then, so this is what you get. When you are configuring a job, you can see which environment you've configured. If it has an x run, how that would be configured, your historical runs. All the m models that are, you know, taking longest to build. So you can do some diagnostics if you have a runaway job that's taking abnormally long. Now, you know, this is originally what we had, our our our SQL query and, you know, a a very limited set of columns that we can use to get insight, and then we have some aggregated values, without a lot of organization. But with the same data, the same raw information in that that original data source. We now have the ability to create a organized, enriched ecosystem for users of Tableau to explore and you know, honestly, just like a cleaner interface, we have columns named properly, default typed properly, so you don't have to do any of your casting in Tableau. You can get rid of those those nulls. So in this column that we cast as a bullion, we now have fallen false and true instead of one in null. So things like that can go a long way if you're trying to make things easier for your Tableau audience, And then, yeah, now you can you have these your data sets organized into, you know, business entities that align with how your organization might work and and get insights. So just showing you how I created this relationship. So yeah. We really hope that this was helpful and gave you a better understanding of how you can be using DBT to refactor your, you know, your SQL, or if you have a lot of custom SQL in your BI workflows, you know, DBT is a great tool for refactoring and applying, you know, software engineering best practices on your data and result, which can give you much cleaner results in your end product. So here we go. Hit to return your screen. So yeah. Again, sorry for the change in going from dark mode to to my version. It's probably blinding everybody in this moment. But just looking at those AML files that he was talking about again, this is just giving kind of like a high level understanding and almost giving kind of like you know, an English, like, a person readable version of what we're talking about. So you'll see in a lot of these, right? We have, even from, like, the name, from Google Analytics, we have a description that we've put into this. You can also do that down to, the database level, the table level, even down do, like, the column name. Right? Where we say here, session ID was primary key of the table, a unique session on the website. So, right, we're having this again. So it's easy for somebody who can go into this and understand what we're looking at. But you can also, down here, I'm not gonna do it right now, but I'm just gonna showcase you where you actually have, like, a command line where I could say DBT docs. If I could type, docs generate And once I hit that, what it's actually gonna end up doing is well, actually, I'm just gonna hit it. Let's again, Why why not? What this is going through and is looking at all of the different things that I have here, where I have my different stage models, I've got different tests, right, making sure things are not null and unique customers' orders, web traffic. Right? It went through all of that and created this little document that I'm now going to show you, a PO of Hit View docs. And this is now gonna be the more human readable way of understanding what this is looking at. So I'm looking at my project, and I can go into Google Analytics for my sources. And here I can see that description that we wrote of, this is the raw tables generated from our website data, including page views. So it's a lot easier to see it here than it is just in the YAML file. And then I can even look down click on to, like, raw carts. Right? And here I can see for user ID, the description, the authenticated user, if a purchase made the user will be generated or registered as a customer. Right? You can add a lot of things in here, as well as seeing the models that are referencing this and the tests that are being done on this. Similar to this one, right, if I see this session ID, you can see that the tests that are being done on this are making sure that it's unique and not null. And so you can even go down and see, like, the code within it. So this is just making it a lot easier to understand. And you can also look at that lineage graph a version kind of like what Jack was showing where you can see kind of the dependencies when it comes down to this. So this is taking raw carts and creating this stage. I can do that down even into let's get rid of you right now. Right down into the models. The one we were looking at there was that orders. Right? And I can see the models that it depends on, the code with it. And if I look at that lineage graph, right, it's gonna showcase that a little bit differently. So again, this really nice, clean, understandable way to look at the documentation on there is another big benefit when it comes to DBT, because I don't know about you. But I hate writing documentation. So having something that can just kinda do that, where all I have to do is just add a little description here, and then it's gonna be put in a nice little beautiful, like, formatted document, that's kind of great for me. Well, with that being said, I don't see any other questions in the chat or a Q and A, but if you do have any questions, please feel free to reach out to us. You will get the webinar replay in your inbox in a few days We've got some information on DBT, our tableau consulting. We also do DBT consulting, and just some different ways to kinda connect with us and talk about DBT. So thank you all for, just joining us today. And we're kinda looking forward to to seeing you all in the future. Hopefully, we'll have another one of these that'll go start talking about some of those more advanced topics, talking about packages, macros, making your own tests, even talking about like publishing this and scheduling this. So look, keep your eyes out for that to also happen. Future. Thank you all for joining.

Transcript

In this webinar, Rachel Kurtz, Jack Hulbert, Colin Murray, and Brooks Barth, from InterWorks, delivered a hands-on introduction to dbt for data analysts and dashboard developers. The session covered the essentials of dbt, showcasing how it streamlines data preparation, testing, and documentation for BI workflows — helping participants transform raw SQL queries into repeatable, maintainable data pipelines. Live demonstrations walked through building models and marts, applying tests, automating documentation, and connecting outputs to Tableau, with practical tips for optimizing team collaboration and real-world project structures.

Back to Videos

dbt Cloud Essentials

Company Information

Tax Information

Payment Options

Other Information

Terms