Data Transformation for Success

Yeah maybe I can just say a couple of words of introduction and then pass over to Barbara and Fadi as well. I am a consultant with Interworks and I've been messing around with data in this field for something like nine years now and been involved in a lot of visual analytics, but along the way been increasingly involved in some of the data preparation that we have to do. So that's kind of where I got involved with doing this webinar. My day job is to lead our membership programme. We have a membership programme called PRO which is designed to help people to make good use of their data and get the most out of their data investment and this webinar is also going to be shared with our PRO members. I'm calling in today out of Amsterdam, or very close to Amsterdam, where the weather is gorgeous and gets you in a very good mood to kick off. So with that maybe Barbara I can ask you to say a couple of words of introduction. Yes hello sorry my name is Barbara Hartmann. I'm a data engineer here at Interworks. I'm based in Berlin. And yeah, also I've been involved with data for quite some time, but I held various jobs. I was working with geographic information systems as a web developer and then moved into the data engineering role, and I've joined Interworks about a year ago. Thanks very much, Barbara. Fadi, maybe you'd like to say a couple of words. Yeah, sure. My name is Fadi. I'm a data engineer at Interwork. I joined last year and very excited about it. I work with a couple of with many multiple of tools like DBT. And I got DBT certified and survived. Just joking. Yeah, and I'm here and very excited to share with you some of our experiences with DBT and how we got into it. Back to you, Paul. Okay, thank you very much Fadi. Nice for you to do that introduction. Let me maybe say a few words about the background of Interworks for those of you that may not know us. The company was founded back in nineteen ninety six in a very small place in Oklahoma called Stillwater, and still where we're headquartered. Over the years we've developed to become a Dell partner from two thousand and one and actually the company's origins were in IT infrastructure. We still have an IT infrastructure division today. We became Tableau's first partner back in two thousand and nine, very proud of that, and then later on in twenty thirteen we expanded our global footprint with a union with Interworks Europe and we then expanded further in twenty eighteen into Asia where we have a presence in Australia and Singapore as well. More recently Tableau was acquired by Salesforce and they moved to make some modifications to their partner program and we actually became the first Salesforce and Tableau global partner which we're particularly proud of. We love to talk about working with the best people having fun to work with the best clients and that's kind of the ethos that's driven the company all these years. We have several practices and as I say we started off as an IT company. Today we have numerous specialist practices solutions which helps us to come up with holistic solutions for particular data challenges. We have a data team which is where Fadi and Barbara are active today. We have an analytics team that works at the front end to create visualisations. An experience team supports that. The experience team is really creative people that come up with great ways to present materials and make those available. The platforms team actually helps to run the BI infrastructure that you may need, so it sort of sits on top of the core IT and runs things like Tableau Server for example, keeps all of that running. Then we have an enablement practice which is all about helping users to interact well with that data infrastructure. They do things like training and they also work with us to prepare some of this webinar material and provide ad hoc assistance on the back of it. We've got a very wide customer base and our clients are present in a lot of brand names that you'll recognize in industries as diverse as education, foods, pharmaceuticals, healthcare, finance, pretty much if you name it, we're involved in it. And we have a wide range of partnerships to support that as well, one of which is DBT which we'll be talking about later in this webinar, our longest standing partnership with Tableau, and we have several additional partnerships over the years which really complement that on infrastructure, data preparation and showing different ways of bringing data out and making it available. We like to brag a little bit and on the next slide I'm not going to read this out, but if you just have a quick look at this you can see there's quite a few highlights that we've run through along the way that we're pretty proud of. Maybe one of these that stands out is the Forbes small giant that may not mean an awful lot to you, but what they actually gave an award for is a relatively small company that actually is able to make very big impacts with larger organisations. If you're interested in that, it's worth a quick Google and a read. It's actually quite a fascinating way of looking at our company and I think it gives a very good impression of who we are. And then just to wrap up the introduction, what should you know about us? Well we pride ourselves in helping organisations to succeed with Tableau and other parts of their data infrastructure. As I mentioned before, we're particularly proud of becoming the first Tableau Gold Partner. We've had several awards along the way from Tableau and from Salesforce, and we're very passionate about what we do, we try to keep our feet on the ground too and only make recommendations that we know or we sincerely believe are going to add value for your organisations. And many of you will know us through our world famous blog at interworks dot com slash blog. So enough about Interworks, let's dive into today's webinar and really what we're aiming to do with you is to we're inviting you to share in a conversation between myself and Barbara and Fadi, when we really want to talk a little bit about how we approach data engineering projects, so data engineering work, some of the challenges we faced along the way, and how DBT helps us in that. And I'm sure as we talk through this you're going to recognise some of the challenges that we faced, and maybe while we stay on this point, it's worthwhile just talking a little bit about how we first got into encountering DBT. And I think all of us, it's fair to say, are pretty excited about it. Perhaps, Fady, I can invite you to just say a little bit about how you first came across this tool and what particularly appeals to you. Yeah. Sure, Paul. Thank you. I started working as a data engineer with, some projects, on Snowflake. And, when you start, like, in on your database or data warehouse without using any transformation tool, you just start writing worksheets and start managing them. When they get a lot, you start sorting them in a folder structure and so on. And I was scared all the time of losing these codes because when you lose it, you you have to start from scratch. So I start putting it in git GitLab or GitHub, whatever. And from there, when the grow when the project grows grows, you need to get other people working on the same project. So you start sharing your worksheets. And then you end up like, where is the current code? How can I get back to the last version of it? Because you have the worksheet on the side the worksheets on the side, the folder structure, and the git and everything get, like, messed up. So this is how I heard about DBT and started using it, which is a great tool to overcome this both challenges, which are like code sharing and diversion control of it. That's interesting Farley. Yeah that's a great aspect of DBT that appealed to me as well. Barbara perhaps you'd like to add to that. I'm very interested in how you came to get familiar with this tool. Yes, I'd love to. Yeah, so as a consultant, you work with a lot of tools. So you consult, I mean, everybody knows there is like a mass of tools that you can use, and we have to evaluate those and figure out which are, yeah, recommendable for certain clients. Most of these tools are no code or low code, which is best for most of the clients. But for me, coming from the, yeah, software engineering, software development area, I particularly like the coding aspect of DBT that you really It feels more like a coding tool, which it is. And you have the version control. You have It's open source, and it has these neat features for the documentation and tests and gives you a structure. I mean, it doesn't enforce it, but it suggests certain structure and best practices, basically software engineering best practices, and that's what I really like. And I very much enjoy working with it. Thanks, Barbara. That's a, you know, very rounded, you know, rounded impression of it. I remember that, you know, when when I first heard about it, I think it was one of those things where we have several partners and a number of our colleagues in data engineering were talking about it. I didn't really know, too much about what it was until I was on a project, and one of my colleagues was talking about it with that with our client, and he was explaining what you could do with it. And I remember sitting there and I was thinking I've run into this trouble. What he's talking about I've run into that on previous projects. I wouldn't really class myself as a data engineer, would see myself more as an analyst that dabbles in querying data and cleaning it up a little bit at the end. But I ran into so many of the issues that he was talking about. I had one of those moments where I said this is a tool that really solves problems that I can recognize and if we can do nothing else in this webinar I hope that you know we can communicate that to you and leave you with a sense of yeah actually I've heard something that makes sense and we could use this. We'll talk a little bit more about the challenges and we'll talk a little bit more about the DBT versions available. In my own case I've actually played around with the DBT cloud version, but we'll talk about some of the options later on because I did notice there was a question just about that. So I think at this stage it might be a good point to invite you to get involved a little bit And Vicky, can I ask you to bring up a poll so that, our participants can, can sort of jump in here? I'm kind of interested to what extent you actually are using Sequel in the work that you do. Perhaps you can take a look and come back to us on that and we can get some sense of what you do. I see the answers coming in here and it is pretty nice to see that diversity, and I'm starting to feel like I'm in really good company too. Here we go. I think quite a few answers have already come in so far, and Vicky maybe we can share the answers at this point so that everybody can take a look. So what I do see is that a good third of you, in fact it's pretty evenly split, about one third of you are regularly working with Sequel, I guess Barbara and Fadi this is a group that you can very much identify with because I'm sure that's your daily work. For me, I get involved in it from time to time and my goodness, thank goodness for Google. That's all I'll say about that. But there are quite a few of you that don't regularly work with it, but for those of you I would say it's not as scary as it looks. I managed to find a pretty good way of getting started and the basics of what you need to work effectively with DBT are quick and easy to learn. So interesting. Thank you for that. And I think perhaps moving on and just sort of keeping you keeping this going, let's let's look at what do we typically do when we get a new project. So again, Barbara and Farley, this is this is one for you. We often as consultants get confronted with completely new data sets, probably with new clients. And for those of you on the call here as well today, I'm sure there are times when you get data coming from a new place that you've not maybe worked with before. So what I'm interested in is what do you typically do to get started when you've faced that? Right? So you've got a new data set, you've got to get started. How do you typically get started and what are the challenges that you often face? And Barbara maybe I can ask this to you first of all. Yeah, I would happen to share my insights. So getting started with the new client, of course, is since, as you mentioned before, we have a very diverse client base, so clients in all kinds of domains. So you of course have to somewhat get familiar with the domain because there are always specifics about that. Then of course, the project, you have to see the client's existing infrastructure or if there is any, then what the data source is, like where the data is coming from, what the business use is, and kind of figure out how the flow might be and what tools to recommend. And if you have an existing project, it's actually, yeah, even more you have to see what was built, where do we stand, do we have any documentation, or where can I find the last working piece of code? So you basically have to get familiar with the situation and get a picture of what where we stand at and where we want to go. That's basically how this usually works. Thanks very much, Barbara. Farley, do you have anything you want to add to that? And by the way, if you want to contribute to this in the chat, I'm very very interested to you know I think we'd all be very interested to know how you approach this kind of challenge but Farley perhaps you've got some ideas? Yeah sure, well I would like to try to give it, like, a very broad overview. So at Interworks, we offer, end to end analytics, as a solution as a solution and and as a service. And in the data practice, I like to divide it into two categories, all the tasks we get from new clients. The first category is about loading the data, which is which need needs a lot of experience. So you need to see to have seen a lot of use cases, whether it's API data databases or files. But the other category, which is I see it's the a very important category is transforming the data after it's slow it has been loaded. For this, you don't only need the experience, but you also need the creativity because you don't you don't wanna repeat your work twice, and you don't want to lose resources when doing the job a lot a lot of times and getting it in a shape that doesn't serve your business logic and business needs. Yeah. So what we some of the challenges we see is some clients come with some transformation work, and this transformation work is contains some SQL codes, for example, which is very, very long, and you need very a lot of time to start understanding what's going there. And from there, you you're then asked to build something on top of this. So this understanding can get quite a challenge sometimes. Yeah. I'm just looking at the chat. Thank you, Farley. I'm just looking at the chat here as well actually. I think there's a consensus coming out here that one of the first things to do is to explore the data, and that can be done in Tableau. Can be done. One of our participants has just suggested doing that with a set of standard querying scripts they have and I've seen other data colleagues on our team doing the same kind of thing and that certainly seems to be a good place to start. And I think from my own experience actually I've always I found that projects vary enormously, but one thing that I often struggle with, and it may just be me, I often find that when I'm starting a project, I'm getting a briefing from somebody. So the client will often give me a briefing about the data structure and the data structures tend to be all over the place. Some of them are fairly simple, but I've had other ones that can be pretty complex and you know something that I've tended to do is to try to explore it, but also if I've got a lot of data coming from different places, lots of different views all sort of linked up together, I stole a trick from one of our data colleagues and actually started to mind map that. In fact, actually quickly threw a scruffy example from an actual project of the kind of things that helped me to get an overview of what's going on. So in this case I mind mapped it because it kind of helped me to include things like what are the requirements, but you could just as easily do something like an ERP diagram or some kind of hand drawn diagram. Because one of the things I found is that data does tend to get pretty messy over time. Again that varies enormously, You know some organizations are very structured and they tend to have the data very clean and very organized. Others it tends to be pretty over the place, pretty messy. Do you have any thoughts about why data does tend to get messy over time? I'll throw this one out to maybe Fadi to yourself. Yeah, sure. For example, I I I would like to start with an example. So picking up on on what I already started. So when you get all this, complex, SQL code, for example, and you start, like, I wanna build on something on top of it, and you don't really understand what's going on there. So what you sometimes do, you go to the raw data loaded and start building things on top of that. What you end up doing is repeating the same work already did when or somebody else did. And this is not the kind of the structure you want to have in your data. And sometimes, well, the natural flow of the data is then unidirectional in one direction. And when you start just building views on top of views and top of views, you might also end up looping through the data. So you very in a very short time, you lose your maybe not short time, but you lose very quickly the overview. Yeah, I definitely recognize that. And that's where that sort of drawing, I found that really very very helpful thing to do. But I guess where I'm coming from is why is it that we you know these things do tend to get messy? And I think a useful kind of concept to bring up here is this idea of technical debt. Have you come across that before actually in your work? Is this a term that you've heard of before? Barbara, is it something that you've come across? Yes. Yes. Definitely. And I can totally agree to what Fadi was saying, the mess that you have, and that's most likely from you have ad hoc queries, you want to quickly get results, and then several people working on the same code and missing documentation and tests. Yeah, It's very easy that you end up in a mess, and it's hard to get a hand of it. I think one of the the most astounding things that I ever came across with working with a project on a project with a client is talking about the idea of spending one of the sprints. They were using sprints to run development of dashboards and the data behind it, and I made a suggestion that we should take some time out with one of the sprints to actually do some cleanup and just think about the structures that we needed and the documentation. And I think nobody on the team, or at least none of the managers that we were working with, actually got it. They didn't see the value of it. And I think the problem with this is that a lot of the benefits that you're going to get by cleaning up and working in a clean way, benefits come over time. They don't come immediately, but the work is immediate. And I think it's problem that a lot of us face in our work to really get forward. So I think we've outlined several issues complexity, the challenge of managing as we go, the challenge of actually keeping the work clean, avoiding repetition. So a lot of those challenges have come up here. I think it's a great opportunity to actually dive into DBT and take a look at this and see how that might help us. Finally I'm going to kind of ask you to jump into the driving seat right now and maybe give us a quick look at DBT. Yeah, just to have said, so we are showing here the DBT Cloud. We are not showing the DBT Core, which is the paid version of it, at least when you're a big team. So this is what you see when you start DBT Cloud. Let me close this to make it simple a little bit. This is the the DBT data and and DBT Cloud IDE. What you see here on the left hand side is the folder structure and some files you get when you initial initialize your project. This is all built by dbt, and you don't have to worry about it. When you start developing, you go to the most important folder, which is the models. Models are nothing but tables or views. You can decide later on if you want to have them like a view or a tip or like tables. And let's have a look into one of these files. What you can see here are some SQL files, dot SQL files, and and dot YAML files. Let's open up one of them and see what's inside. So, surprisingly, this is nothing but, just SQL. So we are selecting from a table some or doing some transformation, and this is transformed by the magic of DBT to a table. No need to worry about the underlying things. But we we see here something else, which is this double curly brackets, which is I like also to call it the DBT magic. This is nothing but Jinja. Jinja is a Pythonic language, help us to do a lot of things. For example, it helps us to make the structure of the product, And this is leading to to have the the DAG here on the bottom side. So what we see here, the fact orders, table is depending on, these two models, and this is using Jinja. We just need to write ref staging Jeff Jeffle shop orders, and this model is pulling data from these models before. So, finally, if I as as I understand this, then each one of those blocks in that drawing at the bottom represents a view or a table where the data is sitting in the database. Yeah. Exactly. So if if we we can go to one of them and see what's in there. So but just by clicking on on it, we see it's another model pulling data, pulling the raw data from somewhere and doing some transformation, whatever this trans transformation is. We can go to the second type of the files we see here, which are the YAML files. One click is enough. Or we can go since we are we we can use this, for example. When we when we open this, we see some text treatable things here. And what's what's it actually doing is putting some configuration in our project. So if we want a model to be tran translated to a table, we can just just put this here, and we can put some description as well. A description is nothing but documentation we are adding to our project. We can put the description also for some columns. We can define some tests on our columns. So the column customer ID here, for example, is is gonna be tested to to not have a null values. And this is this is happening automatically just by adding this one line of configuration. So as I understand it then, the models that you built in that that you can see in the drawing at the bottom, each of those is kind of data somewhere sitting in that in in the views or in the table, and the SQL is telling me how to produce each of those. Exactly. So if I'm working in the staging Stripe payments, is sort of at the at the top row there, If I make changes there, then I know I better be careful because that might mess up my fact orders table or my dim customers table. That's right. But this gives you also another power if you want to build another, like not only the fact orders, but if you if you add something in that model, though you can have another model here, let call it fact customers or whatever, and you can pull the data from this model to both models. Like, the there is no limit of having models building on top of others. Is DBT actually processing the data? Do I have to sort of bring it out of my database and run it in DBT? Or where is the data actually getting handled? Sure. So the data's DBT is not handling any data transform the the the compute power. The compute power comes from your data warehouse or database. What DBT is doing is compiling some code, which is sent to your database, and your database or data warehouse take care care of it. What you can do in DBT is, for example, see that code compiled, and you can also have some preview, but then you're no. I just selected this. So if you select the whole SQL, you will see what DBT is sending to your to your data warehouse. What we can see here, for example, this is very interesting because we just wrote we are refer referencing this model, and DBT is translating that to something that our data warehouse can read and retrieve data. Yeah. This amazingly shows the the sorry. The the modularity of of DBT with these reference functions and also sources, you can also define sources. And that's, yeah, very good for the reduce of things and not building, repeating things basically. And also the other advantage that you have with DBT, what Fadi showed there, the description and the tests in the YAML file, that's actually the most amazing thing that I think because it's right in the code, like right there, so you don't have to go back and forth between several tools. So you write it right in place and you always know you have the up to date description of your things and the tests. Yeah. Totally agree with you, Barbara, about the modularity. So what you actually here doing is instead of having, like, this two hundred lines of calls codes, you are just breaking it down to baby steps to very small views or models, and you can take pull data from from them. Now, like, let's say, if you are at line one hundred and from there, you wanna go somewhere else building another model, you can pick up on that specific model you want data from. So you don't have to repeat yourself because of this modularity. And in addition, it's very easy for somebody to understand if they if they come after you. So speaking of documentation, what Barbara just said and test, Just me me let me just show you how to add these into your data model. So if I wanna add some description for this column, let's say customer ID, all I need to do is to type down description description, and I can then start typing whatever I want. So this is a unique ID for the customer table. Now what you might notice here, it's not it's not unique, so I can also add a unique test very easily. Now when I run this model, this I this column is gonna be checked on its uniqueness. And if somebody comes afterwards, they they are gonna be able to read that and understand it. Now where to read that is, here, DBT generates very nice, documentation, which is not only based on the documentation you you wrote, but also dependent on some data. Well, DBT pull pull some data from your data warehouse and put it in that documentation. So let's have a look here, for example, for the customer's table. These data, the data type, for example, I didn't draw them down, so they they just appeared here. They are coming from the data warehouse, and you you can see also the SQL and some metadata on top, which are the details. But most most importantly and interesting is this lineage here. So let's have a look here. So, Farid, this is actually in the documentation now, is it, that you're looking at? It's kind of an online documentation? Yep. You always have it. You can also generate this documentation each time you run a project. So if you change something, it's gonna be also changed in the documentation. So the documentation, you don't have to worry about it because it's always up to date. Lovely. Yeah. Okay. So I think DBT is one of the things that I really loved about DBT, the fact that so much of this kind of where does the data go, what does it look like, what is it doing. I always found that really hard when working in a project, but one of the other things that I personally had a bit of a struggle with is as I work, how do you keep track of the changes that you're making and as they go? But perhaps just before I ask that I did get a question here. Vicky, have we had in maybe not in the chat that I can see but have we had questions that we haven't yet addressed that you can see? Well, this is where myself as a nontechnical person might let myself down, and you may well have already answered it. But if I let you know what it is, you can always give a response if that's okay, Paul. So apologies if you've already covered this off. The question is, if we have a table structure already defined as per our ERD diagram schema, then can we use that already defined table to populate data from DVT? It is normally giving errors if table already exists in database and not created by DBT first. That's an interesting one. There's quite a lot in that so I do apologize if you did already answer it. Yeah so as I understand it this is coming from somebody that's trying to use DBT on an existing database and they've already got an existing data structure and there are errors coming up where DBT isn't really tracking that structure. Hopefully I've understood that correctly. Don't know if Barbara or Fadi if you have an answer to that at all. Just to help out, one of the things that I think you do have to do is make use of that ginger that Fadi was talking about. So if I've got a query that says go to table x, I've got table x as my source or view x as my source and I want to pull data from there, the documentation piece of it will only really work if I set that up as a source and make that known to DBT and I use that source reference then in my querying. So I don't know if that makes sense, but Fadi you showed an example of that. That am I on the right track with that? Well the way I understood the question is like you have your DBT up and running and is working on a database on some data warehouse and you want to use it on a different warehouse. Right? I is it what the question is saying? That's I think on a on a on an existing infrastructure, infrastructure, I'm not entirely sure if I'm correct in saying this, but usually DBT takes care of the DDLs, so the creating of tables, but and I actually haven't encountered that issue that if the table already exists, but you can also work around that there are these pre hooks kind of like you can add your own DDL language before queries that you create a table. So it is actually there are things that you can work around what DBT is is automatically doing. Yeah. Like to offer sorry. Yeah. I'm gonna interrupt you. I do apologize. So I think we've had some clarification. So it's if already DDL done by ourselves and want DBT to do populate. Right, I think this is a lovely question that we probably won't be able to do justice to in the time constraints of the webinar, but I would love to have a conversation after the webinar with the person asking it. We'll give you a reach out address and please reach out and we'll take a look at it with you and see what's going on. I think the key to it is you do have to be using those hooks that Ginger provides you. There is a little bit of YAML that needs to be done to provide those up front, so the initial sources. It may be a very simple thing to do to just make a couple of tweaks to get that all set up. Maybe we can pick that one up individually after webinar. I'd love to do that. Vicky was there anything else that looks like we've missed or failed to cover? No. You're on form today, Paul. Wow. Alright. Friday, how do we keep how do we keep on track when we're doing a piece of work? We've got all these changes going on, and I don't wanna get lost. There's some fantastic stuff in DBT that we haven't looked at yet. Yeah. Sure. We we can go there. Back to our DBT. So the way to manage this the version control and the code sharing is pretty simple in DBT. So DBT is dividing the job you're doing in development and deployment. This you can see here on the left on the left hand side. You're always in the development until you are done, so you can merge your code into the deployment. Let's go ahead, for example, and see what would happen if I save the changes I did here. When I save these changes, they will appear here in red. So and what you can see here, I am on my personal branch, so I'm not affecting the main branch at all. There's this main branch where the deployment job is running. After developing and doing all this nice stuff to our model and the data structure, we can commit and merge it to the main main branch, which I'm not gonna do at the moment because I did almost no changes. So this you don't do it on five minute basis. You do it, like, when you're done. And everybody else putting data from the main branch and started to develop is gonna see the same changes you did. So you have this only single source of code, let's call it, and nobody is interrupting you while while developing. You are having, like, the name and the branches. That's so amazing. You know, one of the things it's one of those things when I saw that, I kind of kicked myself, and I thought, why have you never thought of SQL as code? It's such an obvious idea to do, isn't it? And you kind of for years I was messing about writing a piece of code, saving it as a file v one, file v two, file v three, and then at some point you can't remember what the heck you did in v two anymore. You've got to have a note by the side of you saying v one I did this and v two I did that and v three I did that and then you sort of think that's motion control man there are tools to do that. You know and then start writing these long names right and yeah I was just thinking about it and then and then you get into Snowflake and you can't save those files so easily because you've got worksheets so now I'm blocking out huge chunks of codes just commenting them out and saying well that was version one and now I'll copy the whole damn thing into version two and I've got five hundred lines of code going on. File name creep, there you go I've seen that in cat absolutely And this did screwing. Why didn't I think of that? It's so obvious. Yeah. What another problem is is control sounds so easy, but but you can get your yourself twisted up with that too, and DBT kinda makes that idiot simple as well. You get a little green thing that says set up a new branch, so I'm developing something new. Leave what works alone. Don't break that. Build something new. And if you've got changes, it lets you know that you've got changes, and you can just carry on and stick it back in there and keep going. And I'm just going, wow. It's so simple, but it makes such a lot of sense. Yeah. And the great thing is also that every developer is working on one project. We all refer to the same Git repo, the code base, but in the warehouse, actually, everybody has their own environment and can have their own schema most likely so you actually don't break anything. That's also a good That's nice too. Write available structure. I mean, can always do that, but DBT is suggesting that and have that infrastructure ready for you to use. That's that's right. And and the other thing that I really love about it is I didn't really have to learn anything new to work with DBT. Everything I was doing in DBT I was already doing. This just kind of helped me to get organized, and I thought that was fantastic. It's got virtually no learning curve about it. But finally, have time to talk about everything that DBT can do, but maybe there are some other things that you'd just like to add and mention. Like I said we won't have time in this webinar to show everything off but are there other functionalities that you think are worth just bringing up before we move off the tool? Yeah. Sure. We, well, to well, to me, the power of DVT is coming from this combination of, the CTE SQL, the modularity you are writing here, and YAML files and the Jinja, the templating Pythonic language. So SQL is very actually very is a declarative language. So just used to query something, there is no logic in it. So when you start needing some loops, some iterations, some complex complex structure, you can't do that in SQL because it's just querying something using the query engine at the end at the end of the day. So, finally, when you when you talk about when you talk about SQL being a declarative language, know, I heard about that recently, but to me as a kind of an amateur that got into this, what that really means is you don't have to tell the computer how to process the data, you just tell it what you want to see at the end of it all and leave it up to the database machinery to do the processing, right? That's exactly what it is, yeah. Okay, But but DBT can do more than that. Exactly. Using adding adding Jinja on top, for example, which bring bring this structure to your code. And you also need the configuration, which is made by the YAML files. Well, now this combination of these three things opens a lot of opportunities to you. So you can start doing complex tech testing, doing slowly changing dimensions, which is already done for you in DBT, doing very, very complex things. So the sky is unlimited. Wow. That's interesting. So if, hopefully, we've we've aroused some interest in, in in this. If if you wanted to get started, so you wanna dip your toe in the water and just, take a look at this for yourself and get started, how would you how would you go about that? What are the what are the options to, you know, to find out more or to get started with DBT? Barbara, don't know if you want to talk about this at all? Yeah, I can because we all started at some point. So with DBT, it's Or a lot of tools, it's easy like these cloud based. They have usually free trials. So with DBT, you can go ahead and Or DBT goes even further. If you are a single developer, you can create an account for free forever. So if you have one person in your team, the decision maker who can like wants to try that out, you can create a free account. And then maybe if you're working with Snowflake as a data warehouse, either you have one or you can also set up a free trial and then just test it out for yourself and see if that works for you basically. And it's very easy to get started. And, Fadi, I think we've got a slide at the back end of this, haven't we, where we we can give a few links to to resources that you might be interested to to to look at. I'm actually going to ask Vicky if you'd like to bring up a final poll here, the last one that we have which invites you to let us know if there's anything you're interested to do to follow-up. But one thing I will do is in the, I don't know if I can do that in the chat here. I don't know if that's coming across, Vicky, because the the technology we're using here may not share it with everybody. In fact I don't think it does. But can I ask you to make sure that we share my email address with the participants so that individual question can get picked up? Think that'd be really good. As well as that hello interworks dot co dot uk that would be good. Yeah I can get that in for you now Paul. Yeah that'd be lovely. I'd like to make sure we don't drop the ball on that. If there's anything we can do to help but be more than happy to do that. Vicky one last question to you. Have you seen any questions coming in that we still need to address in the remaining two minutes or so that we have before the hour runs up? We haven't Paul. I think you've done such a succinct job of sharing everything. I'm really hoping that it's given everybody the opportunity to get a little bit more excited about DBT and hopefully learn from some of the things that you've spoken about today. If there is one thing that you're going to take away from this webinar and share with colleagues, let us know in the chat. We'd love to know what sort of piqued your interest and what you're going to take on board following on from this webinar. With that, just to say as well, please do head over to interworks dot com slash events. We've got a number of webinars coming up on a whole array of BI topics with multiple partners and we'd love to welcome you onto some of those webinars as well. And we will be sending the recording of this session out to everybody who registered.

Transcript

Yeah, maybe I can just say a couple of words of introduction and then pass over to Barbara and Fadi as well. I am a consultant with InterWorks and I've been working with data in this field for something like nine years now and been involved in a lot of visual analytics, but along the way been increasingly involved in some of the data preparation that we have to do. So that's kind of where I got involved with doing this webinar. My day job is to lead our membership program. We have a membership program called PRO which is designed to help people to make good use of their data and get the most out of their data investment, and this webinar is also going to be shared with our PRO members. I'm calling in today out of Amsterdam, or very close to Amsterdam, where the weather is gorgeous and gets you in a very good mood to kick off. So with that, maybe Barbara, I can ask you to say a couple of words of introduction. Yes, hello. My name is Barbara Hartmann. I'm a data engineer here at InterWorks. I'm based in Berlin. And yeah, also I've been involved with data for quite some time, but I held various jobs. I was working with geographic information systems as a web developer and then moved into the data engineering role, and I've joined InterWorks about a year ago. Thanks very much, Barbara. Fadi, maybe you'd like to say a couple of words. Yeah, sure. My name is Fadi. I'm a data engineer at InterWorks. I joined last year and I'm very excited about it. I work with many multiple tools like dbt. And I got dbt certified and survived. Just joking. Yeah, and I'm here and very excited to share with you some of our experiences with dbt and how we got into it. Back to you, Paul. Okay, thank you very much, Fadi. Nice for you to do that introduction. Let me maybe say a few words about the background of InterWorks for those of you that may not know us. The company was founded back in 1996 in a very small place in Oklahoma called Stillwater, and still where we're headquartered. Over the years we've developed to become a Dell partner from 2001, and actually the company's origins were in IT infrastructure. We still have an IT infrastructure division today. We became Tableau's first partner back in 2009, very proud of that, and then later on in 2013 we expanded our global footprint with a union with InterWorks Europe, and we then expanded further in 2018 into Asia where we have a presence in Australia and Singapore as well. More recently, Tableau was acquired by Salesforce and they moved to make some modifications to their partner program, and we actually became the first Salesforce and Tableau global partner, which we're particularly proud of. We love to talk about working with the best people, having fun, to work with the best clients, and that's kind of the ethos that's driven the company all these years. We have several practices, and as I say, we started off as an IT company. Today we have numerous specialist practices and solutions which help us to come up with holistic solutions for particular data challenges. We have a data team which is where Fadi and Barbara are active today. We have an analytics team that works at the front end to create visualizations. An experience team supports that. The experience team is really creative people that come up with great ways to present materials and make those available. The platforms team actually helps to run the BI infrastructure that you may need, so it sort of sits on top of the core IT and runs things like Tableau Server, for example, keeps all of that running. Then we have an enablement practice which is all about helping users to interact well with that data infrastructure. They do things like training and they also work with us to prepare some of this webinar material and provide ad hoc assistance on the back of it. We've got a very wide customer base and our clients are present in a lot of brand names that you'll recognize in industries as diverse as education, foods, pharmaceuticals, healthcare, finance—pretty much if you name it, we're involved in it. And we have a wide range of partnerships to support that as well, one of which is dbt which we'll be talking about later in this webinar, our longest-standing partnership with Tableau, and we have several additional partnerships over the years which really complement that on infrastructure, data preparation, and showing different ways of bringing data out and making it available. We like to brag a little bit, and on the next slide I'm not going to read this out, but if you just have a quick look at this you can see there's quite a few highlights that we've run through along the way that we're pretty proud of. Maybe one of these that stands out is the Forbes Small Giant that may not mean an awful lot to you, but what they actually gave an award for is a relatively small company that actually is able to make very big impacts with larger organizations. If you're interested in that, it's worth a quick Google and a read. It's actually quite a fascinating way of looking at our company and I think it gives a very good impression of who we are. And then just to wrap up the introduction, what should you know about us? Well, we pride ourselves in helping organizations to succeed with Tableau and other parts of their data infrastructure. As I mentioned before, we're particularly proud of becoming the first Tableau Gold Partner. We've had several awards along the way from Tableau and from Salesforce, and we're very passionate about what we do. We try to keep our feet on the ground too and only make recommendations that we know or we sincerely believe are going to add value for your organizations. And many of you will know us through our world-famous blog at interworks.com/blog. So enough about InterWorks, let's dive into today's webinar. And really what we're aiming to do with you is we're inviting you to share in a conversation between myself and Barbara and Fadi, where we really want to talk a little bit about how we approach data engineering projects, so data engineering work, some of the challenges we faced along the way, and how dbt helps us in that. And I'm sure as we talk through this you're going to recognize some of the challenges that we faced, and maybe while we stay on this point, it's worthwhile just talking a little bit about how we first got into encountering dbt. And I think all of us, it's fair to say, are pretty excited about it. Perhaps, Fadi, I can invite you to just say a little bit about how you first came across this tool and what particularly appeals to you. Yeah, sure, Paul. Thank you. I started working as a data engineer with some projects on Snowflake. And when you start on your database or data warehouse without using any transformation tool, you just start writing worksheets and start managing them. When they get a lot, you start sorting them in a folder structure and so on. And I was scared all the time of losing these codes because when you lose it, you have to start from scratch. So I started putting it in Git, GitLab or GitHub, whatever. And from there, when the project grows, you need to get other people working on the same project. So you start sharing your worksheets. And then you end up like, where is the current code? How can I get back to the last version of it? Because you have the worksheets on the side, the folder structure, and the Git, and everything gets messed up. So this is how I heard about dbt and started using it, which is a great tool to overcome both these challenges, which are code sharing and version control of it. That's interesting, Fadi. Yeah, that's a great aspect of dbt that appealed to me as well. Barbara, perhaps you'd like to add to that. I'm very interested in how you came to get familiar with this tool. Yes, I'd love to. Yeah, so as a consultant, you work with a lot of tools. So you consult—I mean, everybody knows there is a mass of tools that you can use, and we have to evaluate those and figure out which are recommendable for certain clients. Most of these tools are no-code or low-code, which is best for most of the clients. But for me, coming from the software engineering, software development area, I particularly like the coding aspect of dbt that you really—it feels more like a coding tool, which it is. And you have the version control. You have—it's open source, and it has these neat features for the documentation and tests and gives you a structure. I mean, it doesn't enforce it, but it suggests a certain structure and best practices, basically software engineering best practices, and that's what I really like. And I very much enjoy working with it. Thanks, Barbara. That's a very rounded impression of it. I remember that when I first heard about it, I think it was one of those things where we have several partners and a number of our colleagues in data engineering were talking about it. I didn't really know too much about what it was until I was on a project, and one of my colleagues was talking about it with our client, and he was explaining what you could do with it. And I remember sitting there and I was thinking I've run into this trouble. What he's talking about, I've run into that on previous projects. I wouldn't really class myself as a data engineer, would see myself more as an analyst that dabbles in querying data and cleaning it up a little bit at the end. But I ran into so many of the issues that he was talking about. I had one of those moments where I said this is a tool that really solves problems that I can recognize, and if we can do nothing else in this webinar, I hope that we can communicate that to you and leave you with a sense of yeah, actually I've heard something that makes sense and we could use this. We'll talk a little bit more about the challenges and we'll talk a little bit more about the dbt versions available. In my own case I've actually played around with the dbt Cloud version, but we'll talk about some of the options later on because I did notice there was a question just about that. So I think at this stage it might be a good point to invite you to get involved a little bit. And Vicky, can I ask you to bring up a poll so that our participants can jump in here? I'm kind of interested to what extent you actually are using SQL in the work that you do. Perhaps you can take a look and come back to us on that and we can get some sense of what you do. I see the answers coming in here and it is pretty nice to see that diversity, and I'm starting to feel like I'm in really good company too. Here we go. I think quite a few answers have already come in so far, and Vicky, maybe we can share the answers at this point so that everybody can take a look. So what I do see is that a good third of you, in fact it's pretty evenly split—about one-third of you are regularly working with SQL. I guess Barbara and Fadi, this is a group that you can very much identify with because I'm sure that's your daily work. For me, I get involved in it from time to time, and my goodness, thank goodness for Google. That's all I'll say about that. But there are quite a few of you that don't regularly work with it, but for those of you I would say it's not as scary as it looks. I managed to find a pretty good way of getting started and the basics of what you need to work effectively with dbt are quick and easy to learn. So interesting. Thank you for that. And I think perhaps moving on and just sort of keeping this going, let's look at what do we typically do when we get a new project. So again, Barbara and Fadi, this is one for you. We often as consultants get confronted with completely new datasets, probably with new clients. And for those of you on the call here as well today, I'm sure there are times when you get data coming from a new place that you've not maybe worked with before. So what I'm interested in is what do you typically do to get started when you've faced that? Right? So you've got a new dataset, you've got to get started. How do you typically get started and what are the challenges that you often face? And Barbara, maybe I can ask this to you first of all. Yeah, I would happily share my insights. So getting started with a new client, of course, since, as you mentioned before, we have a very diverse client base, so clients in all kinds of domains. So you of course have to somewhat get familiar with the domain because there are always specifics about that. Then of course, the project—you have to see the client's existing infrastructure or if there is any, then what the data source is, like where the data is coming from, what the business use is, and kind of figure out how the flow might be and what tools to recommend. And if you have an existing project, it's actually even more—you have to see what was built, where do we stand, do we have any documentation, or where can I find the last working piece of code? So you basically have to get familiar with the situation and get a picture of where we stand and where we want to go. That's basically how this usually works. Thanks very much, Barbara. Fadi, do you have anything you want to add to that? And by the way, if you want to contribute to this in the chat, I'm very interested—I think we'd all be very interested to know how you approach this kind of challenge, but Fadi, perhaps you've got some ideas? Yeah, sure. Well, I would like to try to give a very broad overview. So at InterWorks, we offer end-to-end analytics as a solution and as a service. And in the data practice, I like to divide it into two categories, all the tasks we get from new clients. The first category is about loading the data, which needs a lot of experience. So you need to have seen a lot of use cases, whether it's API data, databases, or files. But the other category, which I see as a very important category, is transforming the data after it has been loaded. For this, you don't only need the experience, but you also need the creativity because you don't want to repeat your work twice, and you don't want to lose resources when doing the job a lot of times and getting it in a shape that doesn't serve your business logic and business needs. Yeah, so some of the challenges we see is some clients come with some transformation work, and this transformation work contains some SQL codes, for example, which are very, very long, and you need a lot of time to start understanding what's going on there. And from there, you're then asked to build something on top of this. So this understanding can get quite a challenge sometimes. Yeah. I'm just looking at the chat. Thank you, Fadi. I'm just looking at the chat here as well actually. I think there's a consensus coming out here that one of the first things to do is to explore the data, and that can be done in Tableau. Can be done—one of our participants has just suggested doing that with a set of standard querying scripts they have, and I've seen other data colleagues on our team doing the same kind of thing, and that certainly seems to be a good place to start. And I think from my own experience actually, I've always—I found that projects vary enormously, but one thing that I often struggle with, and it may just be me, I often find that when I'm starting a project, I'm getting a briefing from somebody. So the client will often give me a briefing about the data structure and the data structures tend to be all over the place. Some of them are fairly simple, but I've had other ones that can be pretty complex, and you know, something that I've tended to do is to try to explore it, but also if I've got a lot of data coming from different places, lots of different views all sort of linked up together, I stole a trick from one of our data colleagues and actually started to mind map that. In fact, I actually quickly threw together a scruffy example from an actual project of the kind of things that helped me to get an overview of what's going on. So in this case I mind-mapped it because it kind of helped me to include things like what are the requirements, but you could just as easily do something like an ERD diagram or some kind of hand-drawn diagram. Because one of the things I found is that data does tend to get pretty messy over time. Again, that varies enormously. You know, some organizations are very structured and they tend to have the data very clean and very organized. Others, it tends to be pretty all over the place, pretty messy. Do you have any thoughts about why data does tend to get messy over time? I'll throw this one out to maybe Fadi yourself. Yeah, sure. For example, I would like to start with an example. So picking up on what I already started. So when you get all this complex SQL code, for example, and you start, I want to build something on top of it, and you don't really understand what's going on there. So what you sometimes do, you go to the raw data that's loaded and start building things on top of that. What you end up doing is repeating the same work somebody else already did. And this is not the kind of structure you want to have in your data. And sometimes, well, the natural flow of the data is unidirectional in one direction. And when you start just building views on top of views on top of views, you might also end up looping through the data. So in a very short time, you lose—maybe not short time, but you lose very quickly the overview. Yeah, I definitely recognize that. And that's where that sort of drawing, I found that really very helpful thing to do. But I guess where I'm coming from is why is it that these things do tend to get messy? And I think a useful kind of concept to bring up here is this idea of technical debt. Have you come across that before actually in your work? Is this a term that you've heard of before? Barbara, is it something that you've come across? Yes, definitely. And I can totally agree with what Fadi was saying, the mess that you have, and that's most likely from you have ad hoc queries, you want to quickly get results, and then several people working on the same code and missing documentation and tests. Yeah, it's very easy that you end up in a mess, and it's hard to get a handle on it. I think one of the most astounding things that I ever came across working on a project with a client is talking about the idea of spending one of the sprints—they were using sprints to run development of dashboards and the data behind it—and I made a suggestion that we should take some time out with one of the sprints to actually do some cleanup and just think about the structures that we needed and the documentation. And I think nobody on the team, or at least none of the managers that we were working with, actually got it. They didn't see the value of it. And I think the problem with this is that a lot of the benefits that you're going to get by cleaning up and working in a clean way, the benefits come over time. They don't come immediately, but the work is immediate. And I think it's a problem that a lot of us face in our work to really get forward. So I think we've outlined several issues—complexity, the challenge of managing as we go, the challenge of actually keeping the work clean, avoiding repetition. So a lot of those challenges have come up here. I think it's a great opportunity to actually dive into dbt and take a look at this and see how that might help us. Fadi, I'm going to kind of ask you to jump into the driving seat right now and maybe give us a quick look at dbt. Yeah, just to have said, so we are showing here the dbt Cloud. We are not showing the dbt Core, which is the paid version of it, at least when you're a big team. So this is what you see when you start dbt Cloud. Let me close this to make it a little simpler. This is the dbt Cloud IDE. What you see here on the left-hand side is the folder structure and some files you get when you initialize your project. This is all built by dbt, and you don't have to worry about it. When you start developing, you go to the most important folder, which is the models. Models are nothing but tables or views. You can decide later on if you want to have them as a view or as tables. And let's have a look into one of these files. What you can see here are some SQL files, .sql files, and .yml files. Let's open up one of them and see what's inside. So, surprisingly, this is nothing but just SQL. So we are selecting from a table or doing some transformation, and this is transformed by the magic of dbt to a table. No need to worry about the underlying things. But we see here something else, which is this double curly brackets, which I also like to call the dbt magic. This is nothing but Jinja. Jinja is a Pythonic language that helps us to do a lot of things. For example, it helps us to make the structure of the product, and this is leading to have the DAG here on the bottom side. So what we see here, the fact_orders table is depending on these two models, and this is using Jinja. We just need to write ref staging_jaffle_shop_orders, and this model is pulling data from these models before. So, Fadi, as I understand this, then each one of those blocks in that drawing at the bottom represents a view or a table where the data is sitting in the database. Yeah, exactly. So if we can go to one of them and see what's in there. So but just by clicking on it, we see it's another model pulling data, pulling the raw data from somewhere and doing some transformation, whatever this transformation is. We can go to the second type of the files we see here, which are the yml files. One click is enough. When we open this, we see some editable things here. And what it's actually doing is putting some configuration in our project. So if we want a model to be translated to a table, we can just put this here, and we can put some description as well. A description is nothing but documentation we are adding to our project. We can put the description also for some columns. We can define some tests on our columns. So the column customer_id here, for example, is going to be tested to not have null values. And this is happening automatically just by adding this one line of configuration. So as I understand it then, the models that you built that you can see in the drawing at the bottom, each of those is kind of data sitting somewhere in the views or in the table, and the SQL is telling me how to produce each of those. Exactly. So if I'm working in the staging_stripe_payments, it's sort of at the top row there. If I make changes there, then I know I better be careful because that might mess up my fact_orders table or my dim_customers table. That's right. But this gives you also another power if you want to build another, not only the fact_orders, but if you add something in that model, you can have another model here, let's call it fact_customers or whatever, and you can pull the data from this model to both models. The there is no limit of having models building on top of others. Is dbt actually processing the data? Do I have to sort of bring it out of my database and run it in dbt? Or where is the data actually getting handled? Sure. So dbt is not handling any data transformation, the compute power. The compute power comes from your data warehouse or database. What dbt is doing is compiling some code, which is sent to your database, and your database or data warehouse takes care of it. What you can do in dbt is, for example, see that code compiled, and you can also have some preview. But then—I just selected this. So if you select the whole SQL, you will see what dbt is sending to your data warehouse. What we can see here, for example, this is very interesting because we just wrote—we are referencing this model, and dbt is translating that to something that our data warehouse can read and retrieve data. Yeah. This amazingly shows the modularity of dbt with these reference functions and also sources, you can also define sources. And that's very good for the reuse of things and not building, repeating things basically. And also the other advantage that you have with dbt, what Fadi showed there, the description and the tests in the yml file, that's actually the most amazing thing that I think because it's right in the code, like right there, so you don't have to go back and forth between several tools. So you write it right in place and you always know you have the up-to-date description of your things and the tests. Yeah, totally agree with you, Barbara, about the modularity. So what you're actually doing here is instead of having this two-hundred-line SQL code, you are just breaking it down to baby steps, to very small views or models, and you can pull data from them. Now, let's say, if you are at line one hundred and from there, you want to go somewhere else building another model, you can pick up on that specific model you want data from. So you don't have to repeat yourself because of this modularity. And in addition, it's very easy for somebody to understand if they come after you. So speaking of documentation, what Barbara just said and tests, let me just show you how to add these into your data model. So if I want to add some description for this column, let's say customer_id, all I need to do is type description, and I can then start typing whatever I want. So this is a unique ID for the customer table. Now what you might notice here, it's not unique, so I can also add a unique test very easily. Now when I run this model, this column is going to be checked on its uniqueness. And if somebody comes afterwards, they are going to be able to read that and understand it. Now where to read that is, here, dbt generates very nice documentation, which is not only based on the documentation you wrote, but also dependent on some data. Well, dbt pulls some data from your data warehouse and puts it in that documentation. So let's have a look here, for example, for the customers table. These data, the data type, for example, I didn't write them down, so they just appeared here. They are coming from the data warehouse, and you can see also the SQL and some metadata on top, which are the details. But most importantly and interesting is this lineage here. So let's have a look here. So, Fadi, this is actually in the documentation now, is it, that you're looking at? It's kind of an online documentation? Yep. You always have it. You can also generate this documentation each time you run a project. So if you change something, it's also going to be changed in the documentation. So the documentation, you don't have to worry about it because it's always up to date. Lovely. Yeah. Okay, so I think dbt is—one of the things that I really loved about dbt, the fact that so much of this kind of where does the data go, what does it look like, what is it doing. I always found that really hard when working on a project, but one of the other things that I personally had a bit of a struggle with is as I work, how do you keep track of the changes that you're making as they go? But perhaps just before I ask that, I did get a question here. Vicky, have we had—maybe not in the chat that I can see, but have we had questions that we haven't yet addressed that you can see? Well, this is where myself as a nontechnical person might let myself down, and you may well have already answered it. But if I let you know what it is, you can always give a response if that's okay, Paul. So apologies if you've already covered this off. The question is, if we have a table structure already defined as per our ERD diagram schema, then can we use that already defined table to populate data from dbt? It is normally giving errors if table already exists in database and not created by dbt first. That's an interesting one. There's quite a lot in that, so I do apologize if you did already answer it. Yeah, so as I understand it, this is coming from somebody that's trying to use dbt on an existing database and they've already got an existing data structure and there are errors coming up where dbt isn't really tracking that structure. Hopefully I've understood that correctly. I don't know if Barbara or Fadi if you have an answer to that at all. Just to help out, one of the things that I think you do have to do is make use of that Jinja that Fadi was talking about. So if I've got a query that says go to table X, I've got table X as my source or view X as my source and I want to pull data from there, the documentation piece of it will only really work if I set that up as a source and make that known to dbt and I use that source reference then in my querying. So I don't know if that makes sense, but Fadi, you showed an example of that. Am I on the right track with that? Well, the way I understood the question is you have your dbt up and running and it's working on a database, on some data warehouse, and you want to use it on a different warehouse. Right? Is that what the question is saying? I think it's on an existing infrastructure. Infrastructure, I'm not entirely sure if I'm correct in saying this, but usually dbt takes care of the DDLs, so the creating of tables, but I actually haven't encountered that issue that if the table already exists, but you can also work around that. There are these pre-hooks, kind of like you can add your own DDL language before queries that you create a table. So it is actually—there are things that you can work around what dbt is automatically doing. Yeah. I'm going to interrupt you. I do apologize. So I think we've had some clarification. So it's if already DDL done by ourselves and want dbt to populate. Right. I think this is a lovely question that we probably won't be able to do justice to in the time constraints of the webinar, but I would love to have a conversation after the webinar with the person asking it. We'll give you a reach-out address and please reach out and we'll take a look at it with you and see what's going on. I think the key to it is you do have to be using those hooks that Jinja provides you. There is a little bit of yml that needs to be done to provide those upfront, so the initial sources. It may be a very simple thing to do to just make a couple of tweaks to get that all set up. Maybe we can pick that one up individually after the webinar. I'd love to do that. Vicky, was there anything else that looks like we've missed or failed to cover? No, you're on form today, Paul. Wow. All right. Fadi, how do we keep on track when we're doing a piece of work? We've got all these changes going on, and I don't want to get lost. There's some fantastic stuff in dbt that we haven't looked at yet. Yeah, sure. We can go there. Back to our dbt. So the way to manage the version control and the code sharing is pretty simple in dbt. So dbt is dividing the job you're doing into development and deployment. This you can see here on the left-hand side. You're always in the development until you are done, so you can merge your code into the deployment. Let's go ahead, for example, and see what would happen if I save the changes I did here. When I save these changes, they will appear here in red. And what you can see here, I am on my personal branch, so I'm not affecting the main branch at all. There's this main branch where the deployment job is running. After developing and doing all this nice stuff to our model and the data structure, we can commit and merge it to the main branch, which I'm not going to do at the moment because I did almost no changes. So you don't do this on a five-minute basis. You do it when you're done. And everybody else pulling data from the main branch and starting to develop is going to see the same changes you did. So you have this single source of code, let's call it, and nobody is interrupting you while developing. You are having the name and the branches. That's so amazing. You know, one of the things—it's one of those things when I saw that, I kind of kicked myself, and I thought, why have you never thought of SQL as code? It's such an obvious idea to do, isn't it? And you kind of for years I was messing about writing a piece of code, saving it as a file v1, file v2, file v3, and then at some point you can't remember what the heck you did in v2 anymore. You've got to have a note by the side of you saying v1 I did this and v2 I did that and v3 I did that, and then you sort of think that's version control, man, there are tools to do that. You know, and then you start writing these long names, right? And yeah, I was just thinking about it and then you get into Snowflake and you can't save those files so easily because you've got worksheets, so now I'm blocking out huge chunks of code, just commenting them out and saying well, that was version one and now I'll copy the whole thing into version two and I've got five hundred lines of code going on. Filename creep, there you go. I've seen that. Absolutely. And describing everything. Why didn't I think of that? It's so obvious. Yeah. Version control sounds so easy, but you can get yourself twisted up with that too, and dbt kind of makes that idiot-simple as well. You get a little green thing that says set up a new branch, so I'm developing something new. Leave what works alone. Don't break that. Build something new. And if you've got changes, it lets you know that you've got changes, and you can just carry on and stick it back in there and keep going. And I'm just going, wow. It's so simple, but it makes such a lot of sense. Yeah. And the great thing is also that every developer is working on one project. We all refer to the same Git repo, the code base, but in the warehouse, actually, everybody has their own environment and can have their own schema most likely, so you actually don't break anything. That's also good. That's nice too. Right, available structure. I mean, you can always do that, but dbt is suggesting that and has that infrastructure ready for you to use. That's right. And the other thing that I really love about it is I didn't really have to learn anything new to work with dbt. Everything I was doing in dbt I was already doing. This just kind of helped me to get organized, and I thought that was fantastic. It's got virtually no learning curve about it. But Fadi, we haven't got time to talk about everything that dbt can do, but maybe there are some other things that you'd just like to add and mention. Like I said, we won't have time in this webinar to show everything off, but are there other functionalities that you think are worth just bringing up before we move off the tool? Yeah, sure. Well, to me, the power of dbt is coming from this combination of the CTE SQL, the modularity you are writing here, and yml files and the Jinja, the templating Pythonic language. So SQL is actually a declarative language. So it's just used to query something, there is no logic in it. So when you start needing some loops, some iterations, some complex structure, you can't do that in SQL because it's just querying something using the query engine at the end of the day. So, Fadi, when you talk about SQL being a declarative language, you know, I heard about that recently, but to me as a kind of an amateur that got into this, what that really means is you don't have to tell the computer how to process the data, you just tell it what you want to see at the end of it all and leave it up to the database machinery to do the processing, right? That's exactly what it is, yeah. Okay, but dbt can do more than that, exactly. By adding Jinja on top, for example, which brings this structure to your code. And you also need the configuration, which is made by the yml files. Well, now this combination of these three things opens a lot of opportunities to you. So you can start doing complex testing, doing slowly changing dimensions, which is already done for you in dbt, doing very complex things. So the sky is unlimited. Wow. That's interesting. So if, hopefully, we've aroused some interest in this. If you wanted to get started, so you want to dip your toe in the water and just take a look at this for yourself and get started, how would you go about that? What are the options to find out more or to get started with dbt? Barbara, I don't know if you want to talk about this at all? Yeah, I can because we all started at some point. So with dbt, or a lot of tools, it's easy—these cloud-based. They have usually free trials. So with dbt, you can go ahead and—dbt goes even further. If you are a single developer, you can create an account for free forever. So if you have one person in your team, the decision maker who wants to try that out, you can create a free account. And then maybe if you're working with Snowflake as a data warehouse, either you have one or you can also set up a free trial and then just test it out for yourself basically. And it's very easy to get started. And, Fadi, I think we've got a slide at the back end of this, haven't we, where we can give a few links to resources that you might be interested to look at. I'm actually going to ask Vicky if you'd like to bring up a final poll here, the last one that we have which invites you to let us know if there's anything you're interested to do to follow up. But one thing I will do is in the—I don't know if I can do that in the chat here. I don't know if that's coming across, Vicky, because the technology we're using here may not share it with everybody. In fact, I don't think it does. But can I ask you to make sure that we share my email address with the participants so that individual question can get picked up? I think that'd be really good. As well as that, hello@interworks.co.uk, that would be good. Yeah, I can get that in for you now, Paul. Yeah, that'd be lovely. I'd like to make sure we don't drop the ball on that. If there's anything we can do to help, we'd be more than happy to do that. Vicky, one last question to you. Have you seen any questions coming in that we still need to address in the remaining two minutes or so that we have before the hour runs up? We haven't, Paul. I think you've done such a succinct job of sharing everything. I'm really hoping that it's given everybody the opportunity to get a little bit more excited about dbt and hopefully learn from some of the things that you've spoken about today. If there is one thing that you're going to take away from this webinar and share with colleagues, let us know in the chat. We'd love to know what sort of piqued your interest and what you're going to take on board following on from this webinar. With that, just to say as well, please do head over to interworks.com/events. We've got a number of webinars coming up on a whole array of BI topics with multiple partners, and we'd love to welcome you onto some of those webinars as well. And we will be sending the recording of this session out to everybody who registered.

In this InterWorks webinar, Paul Vincent, Barbara Hartmann, and Fadi Al Rayes explored dbt (data build tool) and its role in modern data transformation workflows. They discussed common data engineering challenges including version control, code sharing, technical debt and maintaining clean data structures. The presenters demonstrated dbt Cloud’s key features: modular SQL models, Jinja templating for dynamic logic, built-in documentation generation, automated testing capabilities and Git-based version control. They highlighted how dbt treats SQL as code, enabling collaborative development through branching strategies while maintaining individual development environments. The session emphasized dbt’s ability to compile transformations that execute in the data warehouse, reducing repetitive work and improving code maintainability through reusable models and automated lineage tracking.

Back to Videos

Data Transformation for Success

Company Information

Tax Information

Payment Options

Other Information

Terms