Creating MD5 Hash Keys in Snowflake via Matillion

Data

Creating MD5 Hash Keys in Snowflake via Matillion

In my last blog, I discussed how to do an Upsert in Snowflake using Matillion. Using the same workflow, I will discuss how we got the key value using the MD5 hash functionality available in Snowflake and why we need to do it. The why is easy, so let’s start there. The MD5 gives us a key value that we can use to reference back from the Fact tables in our data warehouse to the Dimension tables.

If you’re wondering why we need a surrogate key (SK) at all, you’re probably not alone. We use a surrogate key for join performance. Using a surrogate key is faster when joining two tables than using multiple columns in your join statement. This makes writing your queries a little faster while improving the performance of your query, and who doesn’t want that?

Here is a quick overview of what the workflow will look like when we finish:

Our first step is to make sure we understand which column, or combination of columns, gives us a unique value. In this case, it is the Airport Code column.

Now we simply need to bring in a Calculator tool onto the canvas:

Finally, we write the code below to create our key. A quick note on the COALESCE function in the code: We use this so that, in the event of a null value in the unique field, you will still get a unique key value.

Just as a check, you will want to do a row count on your key value to make sure that you truly have a unique key. After performing this check, you can now refer to my last blog to create your Upsert. Once all of that is completed, you simply need to create the table and run the Upsert. Before creating your table, simply pull out a Rename tool in your Transformation canvas where you have created your key and rename your columns. Then, select the Metadata tab and check the Text Mode box.

This will allow you to copy the Metadata of the table.

You can now paste this information into your Create Table tool in the Orchestration. While this may not help much with a small table like this, it can be a real time-saver with larger tables.

More About the Author

Scott Perry

Data Engineer
Books for Your Business: The Goal: A Process of Ongoing Improvement While you might expect that this series would concentrate on technology books (after all, I am a tech consultant), this week we are ...
Creating MD5 Hash Keys in Snowflake via Matillion In my last blog, I discussed how to do an Upsert in Snowflake using Matillion. Using the same workflow, I will discuss how we got the ...

See more from this author →

Subscribe to our newsletter

  • I understand that InterWorks will use the data provided for the purpose of communication and the administration my request. InterWorks will never disclose or sell any personal data except where required to do so by law. Finally, I understand that future communications related topics and events may be sent from InterWorks, but I can opt-out at any time.
  • This field is for validation purposes and should be left unchanged.

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072