This is the first in a sequence of short-form blog posts I will be doing talking about Terraform and its specific benefits (or drawbacks) to data teams.
What Are You, and What’s Your Data Challenge?
You’re the leader of a small data team in a medium company and your job is to build and maintain robust data pipelines to service analytics and reporting for internal stakeholders. Your company ingests data from a number of different source systems into one data warehouse—Snowflake. As the Data Engineering team, you “own” Snowflake. Snowflake is yours to govern and control. You and your team have built and maintained integral data infrastructure that has helped power many pivotal reports that key decision makers review regularly to help run the business.
All of a sudden, you start getting questions about cost. Something is blowing up and blips are appearing on reports, and they all point back to Snowflake. Confused, you start digging into what could be causing the spike in costs only to find a number of 4XL warehouses that were configured months ago without an auto-suspend parameter. You immediately investigate the warehouses to see if there are any queries being run on them and find they haven’t been used since days after they were provisioned. You shut them down. Now, you have to report your findings, and it won’t be a fun conversation.
What could have been done differently? What needs to happen now? You thought your team had done a good job of utilizing the age-old “principle of least privilege,” but something must have slipped past you. You need a way to have an audit trail of all changes made and gatekeep infrastructure changes so that you don’t end up in the same situation again.
Introducing Infrastructure as Code (IaC), specifically, Terraform.
The Benefits of Terraforming Data Infrastructure
The above situation is just one example of a predicament that can be solved with Terraform. For the sake of brevity, I will focus my discussion on the benefits of Terraforming Snowflake to counter the above situation, but the benefits extend to any and all data tools that have stable Terraform providers (dbt Cloud, Monte Carlo, Fivetran, GitHub, Datadog, etc.). Terraform and IaC in general were developed to offer a number of benefits to those hoping to securely manage their infrastructure. Below are just a few I have listed.
- Terraform is declarative. You are defining what you want your infrastructure to look like in code and the Terraform code will do the work of figuring out how it needs to be deployed. When you look at your code, you are seeing your desired state of your infrastructure. If there is drift from the desired state, you know something has gone wrong and you can isolate it and resolve it. This means, if you have locked down your permissions correctly and manual provisioning is not available, the code is self-documenting.
- Code can be versioned. You can store your Terraform code in a Git repo and version control the code to allow for rollbacks when necessary and audit trails for merges to the main branch.
- To piggyback off the last point, code stored in Git can be gatekept by PR reviews. If you have locked down permissions on your Snowflake instance, for example, to not allow any permissions to manually provision new objects (aside from admins during break-glass scenarios), you are able to fully control the process of provisioning through PR reviews and approvals. This gives SecOps and stakeholders more confidence that users are not treating the applications as “the wild west” and cannot effectively cause unpredicted costs.
- Finally, since Terraform defines the desired state of your data application’s infrastructure, you can easily develop multiple environments (Dev, QA, Prod). Rather than having Prod data exposed in the same Snowflake account where Developers are building new applications and doing testing, maybe you want to have a separate account for each environment so that there’s no risk of users finding backdoors to PII in the Prod environment when testing their applications. With Terraform, you are much more easily able to replicate environments.
- Disaster recovery becomes significantly simpler when your infrastructure is defined as code. If something catastrophic happens to your Snowflake account—maybe someone accidentally drops a critical database or a configuration gets corrupted—you can rebuild your entire infrastructure from your Terraform code. Rather than scrambling to manually recreate databases, schemas, warehouses, role and permissions from memory or incomplete documentation, you just point Terraform at a new environment and run it. Pair this with running Fivetran historical syncs and a well-defined dbt pipeline (assuming this is your architecture) and your data infrastructure is back up and running in minutes or hours instead of days or weeks. This peace of mind alone can be invaluable, especially when you’re responsible for business-critical data pipelines.
- Team collaboration becomes much more structured and transparent with Terraform. Multiple team members can work on different parts of the infrastructure simultaneously without stepping on each other’s toes. You can clearly see who is working on what through open pull requests, and team members can review each other’s changes before they go live. This collaborative approach means knowledge about your infrastructure isn’t siloed in one person’s head—it’s documented in code that everyone can see, understand, and contribute to. When someone goes on vacation or leaves the team, their infrastructure decisions are preserved in the codebase rather than lost.
The Drawbacks of Terraforming Data Infrastructure
The main drawback to Terraforming your data infrastructure is going to be that your team needs to learn to write and understand Terraform. It’s not a huge learning curve for those who already have programming experience (especially in declarative languages such as SQL), but it can still be a lift to get a team up to speed.
Another major drawback could be migration costs. Unfortunately, migrating infrastructure resources that were not previously managed by Terraform into Terraform can be a bit cumbersome. Terraform manages state in such a way that in order for things that were previously unmanaged to be managed, you need to “import” them into your Terraform State file. This means, you would need to systematically identify and import all existing infrastructure (say, in your Snowflake account), if you want everything to be managed in code. To get around this, you could just implement controls that allow you to start deploying all future infrastructure in Terraform and incrementally work on migrating legacy infrastructure over time.
If there is a separation in ownership among the data tools the company uses, introducing Terraform as a means to managing infrastructure may cause more friction than it alleviates. Infrastructure management using Terraform or another IaC tool may still be utilized, but it may require a more fragmented deployment ownership approach that increases workflow hoops to jump through (multiple team PR reviews?) and makes the value proposition much more undesirable as it could slow development.
State file management introduces its own set of headaches that you need to be prepared for. Terraform tracks your infrastructure in a state file, and if that file gets corrupted, out of sync or accidentally deleted, you’re in for a bad time. You’ll need to set up remote state storage (usually in S3 or a similar service) and implement state locking to prevent multiple people from applying changes simultaneously. If someone makes a manual change in the Snowflake UI outside of Terraform, you now have “drift” between your code and reality, and resolving that drift can be tricky. The state file becomes a critical piece of infrastructure itself that needs to be backed up, secured, and managed carefully.
Additionally, you may run into situations where Terraform’s Snowflake provider lags behind Snowflake’s feature releases. When Snowflake announces a shiny new feature that you want to use, there’s no guarantee the Terraform provider will support it immediately. You might have to wait weeks or months for the provider to be updated, or in some cases, certain features might not be available in Terraform at all. This can be frustrating when you’re trying to adopt new capabilities quickly, and it might force you to use manual provisioning or workarounds that defeat the purpose of having everything in code.
And finally, to top off the drawbacks, introducing Terraform as a structured and controlled infrastructure management option requires adherence to security and governance practices that limit users of the data tools from being able to build what they want. A control structure is put into place that slows development and might introduce bottlenecks depending on service structure. Sometimes this is undesirable. Deadlines need to be met and bottlenecking teams based on admin approval just to get a new database stood up and start ingesting new data may not be a cost a business is willing to take on.
The Verdict on Terraform
So, should you Terraform your data infrastructure? I’ll give the classic consultant’s answer: It depends. As with everything, there are always trade-offs and it will be up to you, your team and your stakeholders to decide what is the best course of action for your organization.
Terraform offers control, auditability, versioning and governance when it comes to managing your data infrastructure. It also brings a slew of challenges that many data teams are not staffed or prioritized to take on.
As always, we are here to help consult and guide you should you need any help with architectural or implementation considerations. Please reach out to us with your questions.
