Master Class Q&A: Why Snowflake?

Data

Master Class Q&A: Why Snowflake?

We’re currently in the midst of an exciting series of virtual Master Classes with Snowflake. Some good questions were posed during the first session, Why Snowflake? There were so good, in fact, that we decided to highlight some of them here, so even if you weren’t able to attend the event, you can still reap the benefits of these insights.

Nested JSON

Question: I am assuming you would be creating some columnstore in the native store. How does nested JSON work in that case?

Snowflake includes functionality to load and transform JSON files as part of its commitment to semi-structured data sources. Below is a list of several articles to get you started, including an example of how to laterally flatten elements from a nest JSON array:

Cache Replication

Question: In scale-out, since the data is stored in proprietary format, how does the cache replicate to another cluster so that other DWH can access the data?

Snowflake caches information in three separate layers, which it will dynamically utilise when a request for the same information is made. The results cache, held at a services layer, mitigates the need to replicate cached objects between servers in a virtual warehouse cluster. You can find more information on the subject below, and since some of the caching is destroyed when a virtual warehouse is suspended, I’ve also included an article on virtual warehouse patterns. Have a look at the consumption warehouse pattern, which is scripted to maximise cache usage.

Data Sharing

Question: In what format do you share the data? Is that a proprietary format, or can it be any other format like CSV, Parquet and so on?

Snowflake’s data sharing is a groundbreaking functionality that allows one Snowflake client to share their data with anyone in their business circle, be they clients, vendors, suppliers, etc. This data sharing can be established in one of two ways depending on whether the person with whom you are sharing has an instance of Snowflake themselves, or they will use your instance. Here is some more reading on the subject:

Question: Can this replication happen for specific objects, or will it be there for the entire storage?

Cross-regional data replication enables you to replicate some, or all, of a given database between regions and cloud service providers. For example, a Snowflake instance on AWS in Singapore could establish replication to another instance on Azure in Europe. Here is some more information on the subject:

Data Masking

Question: How can we mask data in Snowflake?

There are a couple ways of looking at this: either from a platform security context or from a usage sensitivity context.

  1. Platform Security – Snowflake natively encrypts traffic to and from the platform using AES 256 strong encryption and communicating on TLS 1.2 to ensuring all traffic it generates is secure. More information on the security options available to the platform can be found here.
  2. Data Masking – If you want to scramble data prior to submission to Snowflake, you could apply a cipher, or you could use a solution from Snowflake’s partner ecosystem such as Dataguise. If you wanted to apply masking just to selected users, we would probably recommend bringing the data unmasked into Snowflake and then using a secure view to scramble the relevant fields.

Scalability and Pricing

Question: How does Snowflake account for the increase in warehouse size dynamically? How is the pricing affected according to scale up and scale out?

Snowflake compute is called a virtual warehouse. Virtual warehouses come in a variety of t-shirt sizes, each of which doubles the compute available for query operations. A virtual warehouse can be resized manually to a different t-shirt size at any time, or you could use a scaling policy to automated increases in the cluster size (for parallel processing). These articles on the subject explain how scaling can be conducted and how this applies to costings:

Looking for More Resources?

Check out our entire Zero to Snowflake blog series for everything from getting started with Snowflake to deep dives into more advanced features and functionalities. And don’t miss out on the rest of the Master Class series. With several more options available, you’re guaranteed to find the right session for your data needs. We hope to see you in the coming weeks!

More About the Author

Paul Middlewick

Data Engineer
Master Class Q&A: Why Snowflake? We’re currently in the midst of an exciting series of virtual Master Classes with Snowflake. Some good questions were posed ...
Zero to Snowflake: Tips for Query Building in Snowflake The more you use the Snowflake user interface (UI) for query building, the more layers, panels and tricks you discover it has. I’ve ...

See more from this author →

Subscribe to our newsletter

  • I understand that InterWorks will use the data provided for the purpose of communication and the administration my request. InterWorks will never disclose or sell any personal data except where required to do so by law. Finally, I understand that future communications related topics and events may be sent from InterWorks, but I can opt-out at any time.
  • This field is for validation purposes and should be left unchanged.

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

black. lives. matter.