A couple years ago, as I was trying to teach myself programming, I managed to wipe out my computer. It was a silly mistake—all I was trying to do was learn how to work in command line and set a directory for R. Through some degree of frustration, I deleted the folders that kept ending up in the wrong place and, in my fit of overzealousness, managed to delete all other folders as well.
Since that time, I’ve always been concerned (perhaps healthily) about deleting something I didn’t mean to. Lately, I’ve been using Snowflake, and as I recently conveyed this story, a colleague reminded me that there’s no need to fear because through their data architecture, Snowflake has calmed the fears of anyone who’s had the same moment of panic I did.
Immutable Data Storage with Snowflake
How did they do this? Let’s break down how Snowflake stores data: in Snowflake, all data is stored in micro-partitions, which are contiguous units of storage that are immutable.
Immutability means that once a micro-partition is written, it will never be changed. Vocabulary.com gives a fun definition that says, “There are many things in life that are immutable; these unchangeable things include death, taxes, and the laws of physics.” To that list, you can now also add your data in Snowflake.
What does this mean practically? Yesterday, I wrote a query to pull back a list of distinct Product IDs that we have in our inventory:
CREATE OR REPLACE TABLE PRODUCTLIST
AS SELECT DISTINCT PRODUCTID FROM PRODUCTDATA;
I come back to work this morning and find out that our PRODUCTDATA dataset has changed as we’ve discontinued several products overnight. If I run the same query again, I get a new table with fewer rows. But what happened to those old rows? Well, they’re no longer in our table, but Snowflake has preserved the state of those rows in their original micro-partitions.
Time Travel Functionality in Snowflake
The beauty of this immutability is if my manager comes back this afternoon and asks what products we’ve discontinued, I can take advantage of Snowflake’s time-travel functionality to pull back the micro-partitions of my PRODUCTDATA table as it existed yesterday.
That query would look as follows:
SELECT PRODUCTID FROM PRODUCTDATA at(timestamp => ‘Fri, 01 Mar 2019 03:30:00 -0500’::timestamp)
A couple of notes about this query: The end of the timestamp (-0500) is the offset from UTC time. Here, I used at because I wanted the table as it was at that time exactly, but I could also use before. In addition to using timestamps, Snowflake also allows offset and identifier for statements (such as a Query ID) to locate which point you want to go back to. With this new query, I have the result above that shows me what was in the table yesterday and my PRODUCTLIST table that returns the results of product data as of today. From here, I can compare the two results to see which Product IDs were discontinued.
Another result of time travel and this immutability is that even if I run the query DROP TABLE PRODUCTLIST, my full inventory isn’t gone. I can run UNDROP TABLE PRODUCTLIST and everything is back again. I love that there’s now an answer when folks ask (slightly panicked), “How do I undrop that?!”
Limits on Snowflake Time Travel
There are limitations on time-travel capabilities, which are detailed below (these were taken from Snowflake’s website, which has more details on Time Travel).
The standard retention period is one day (24 hours) and is automatically enabled for all Snowflake accounts. For Snowflake Standard Edition, the retention period can be changed to 0 at the account and object levels (for databases, schemas and tables).
For Snowflake Enterprise Edition, the retention period can be changed as follows:
- For temporary or transient objects, set to 0 (or back to the default of one day).
- For permanent objects, set to any value from 0 up to 90 days.
Excited to dive in to Snowflake yet? You have nothing to lose! And even if you do, you can just get it back as long as you’re within the above time limits and your account is set up for it.