We’re currently in the midst of an exciting series of virtual Master Classes with Snowflake. Some good questions were posed during the first session, Why Snowflake? There were so good, in fact, that we decided to highlight some of them here, so even if you weren’t able to attend the event, you can still reap the benefits of these insights.
Nested JSON
Question: I am assuming you would be creating some columnstore in the native store. How does nested JSON work in that case?
Snowflake includes functionality to load and transform JSON files as part of its commitment to semi-structured data sources. Below is a list of several articles to get you started, including an example of how to laterally flatten elements from a nest JSON array:
- Snowflake | Support for JSON
- Snowflake | How to Lateral Flatten Nested JSON
- InterWorks | Loading and Querying JSON Data
Cache Replication
Question: In scale-out, since the data is stored in proprietary format, how does the cache replicate to another cluster so that other DWH can access the data?
Snowflake caches information in three separate layers, which it will dynamically utilise when a request for the same information is made. The results cache, held at a services layer, mitigates the need to replicate cached objects between servers in a virtual warehouse cluster. You can find more information on the subject below, and since some of the caching is destroyed when a virtual warehouse is suspended, I’ve also included an article on virtual warehouse patterns. Have a look at the consumption warehouse pattern, which is scripted to maximise cache usage.
Data Sharing
Question: In what format do you share the data? Is that a proprietary format, or can it be any other format like CSV, Parquet and so on?
Snowflake’s data sharing is a groundbreaking functionality that allows one Snowflake client to share their data with anyone in their business circle, be they clients, vendors, suppliers, etc. This data sharing can be established in one of two ways depending on whether the person with whom you are sharing has an instance of Snowflake themselves, or they will use your instance. Here is some more reading on the subject:
Question: Can this replication happen for specific objects, or will it be there for the entire storage?
Cross-regional data replication enables you to replicate some, or all, of a given database between regions and cloud service providers. For example, a Snowflake instance on AWS in Singapore could establish replication to another instance on Azure in Europe. Here is some more information on the subject:
Data Masking
Question: How can we mask data in Snowflake?
There are a couple ways of looking at this: either from a platform security context or from a usage sensitivity context.
- Platform Security – Snowflake natively encrypts traffic to and from the platform using AES 256 strong encryption and communicating on TLS 1.2 to ensuring all traffic it generates is secure. More information on the security options available to the platform can be found here.
- Data Masking – If you want to scramble data prior to submission to Snowflake, you could apply a cipher, or you could use a solution from Snowflake’s partner ecosystem such as Dataguise. If you wanted to apply masking just to selected users, we would probably recommend bringing the data unmasked into Snowflake and then using a secure view to scramble the relevant fields.
Scalability and Pricing
Question: How does Snowflake account for the increase in warehouse size dynamically? How is the pricing affected according to scale up and scale out?
Snowflake compute is called a virtual warehouse. Virtual warehouses come in a variety of t-shirt sizes, each of which doubles the compute available for query operations. A virtual warehouse can be resized manually to a different t-shirt size at any time, or you could use a scaling policy to automated increases in the cluster size (for parallel processing). These articles on the subject explain how scaling can be conducted and how this applies to costings:
Looking for More Resources?
Check out our entire Zero to Snowflake blog series for everything from getting started with Snowflake to deep dives into more advanced features and functionalities. And don’t miss out on the rest of the Master Class series. With several more options available, you’re guaranteed to find the right session for your data needs. We hope to see you in the coming weeks!