This series takes you from zero to hero with the latest and greatest cloud data warehousing platform, Snowflake.
In the entire history of the world, sharing information has never been easier. This becomes obvious just by looking at social media platforms. Their technological innovations have enabled billions of users worldwide to share Instagram stories, TikTok videos, World War 3 memes, baby/yoga/kitten videos and maddening Facebook comments in ways that people dared not think possible only a few years ago.
Cloud data platforms are catching up, and Snowflake is way ahead of the curve. The use case for sharing data within an organization has always been clear. But the next phase is revolutionizing how we share data with those on the outside. One of Snowflake’s most unique features is Secure Data Sharing, which allows users to seamlessly and securely share data with external organizations. The friction and cost of sharing information with outsiders has dropped dramatically. Even so, the chilling reality is that in 2020 multi-billion-dollar companies are still transferring data to other companies using technologies that were developed 40 years ago.
Sharing Sensitive Data Externally
Let’s put this to the test. In your organization, if you need to pass data to a team in another company in a secure way, how would you do it? Secure FTP? Ugh! An expensive proprietary tool? Blegh! Or—worst of all—an email (hopefully you would at least password-protect it)? Don’t you dare! If any of these are the case, you really need to consider Snowflake Secure Data Sharing.
Moreover, some of you work for a data vendor. Your company is literally in the business of selling valuable data to customers. For you, the next section is mandatory reading because Secure Data Sharing may be the solution you have been waiting for… or the one your competitor already has!
How Does Secure Data Sharing Work?
With Secure Data Sharing, the participants involved are a data provider (you) and one or more consumers (outside organizations). Snowflake enables sharing of read-only database tables, views and user-defined functions (UDFs) from providers to consumers using Secure Shares. As an end objective, a data consumer could see a read-only version of the database using their own account. All this can be set up in under 15 minutes, which we will see later.
A Secure Share is a named object in Snowflake that contains all the information required to share a database, including all the privilege grants, consumer accounts and objects being shared (tables, views, UDFs), etc. A Secure Share is not the database itself but a metadata wrapper that will point to the data stored with the provider’s account.
The beauty of Secure Data Sharing is that absolutely no data gets transferred from the provider to the consumers. Remember what I said about the Secure Share not being the database itself but a metadata wrapper that points to the data stored by the provider? This is made possible by the unique way that Snowflake has architected its data platform to separate storage from compute while managing data at the metadata layer.
This means the consumer does not incur any storage costs while reading from the shared database and, instead, only incurs compute costs for querying the data. However, this doesn’t prevent the consumer from copying or extracting the data from the read-only database. For example, creating another table within Snowflake that pulled from the read-only table would then incur storage costs.
Two Types of Consumers
As stated before, the participants in Secure Data Sharing are a data provider and one or more consumers. A provider creates a Secure Share to export, and the consumer imports the Secure Share to consume. However, there are two different types of consumers: reader accounts and full consumer accounts. The difference between these two types of consumer accounts affects who pays for the compute and storage resources.
Full consumer accounts are existing Snowflake customers. If the consumer is already a Snowflake customer, the data can be shared directly to the consumer’s existing account. In this case, the consumer would pay for all compute resources incurred by querying the shared databases.
Reader accounts are for consumers who are not Snowflake customers. If the consumer is not a Snowflake customer, the provider can create Reader Accounts that the consumer can use. In the case of Reader Accounts, all costs would be borne by the provider who is the Snowflake customer, although those costs could be tracked and invoiced back to the consumer:
Setting up a Secure Share and Reader Account
Let’s begin with an example using my personal database (BDU), which has a schema containing flight data (FAA). There are eight tables in the schema:
Click the Shares tab at the top of the user console, making sure that you toggle the view to Outbound since you are a provider creating a Secure Share that will go to an outside account. Then hit the Create button to open the menu to create a Secure Share. Please note that you need ACCOUNTADMIN privileges, or a custom role that has been granted these specific privileges, to create a Secure Share. SYSADMIN privileges will not be enough:
In the menu, insert the Secure Share Name (SHAREDEMO), and click the button to Select Tables & Secure Views to select the eight tables. Please note that you can select or deselect tables or views as needed. Next, hit the Create button to complete the process:
Now you see that Secure Share has been created and the data can be previewed. Please note that you need an available Virtual Warehouse that is usable by your current role in order to run this query. Next, click the button to Add Consumers:
As you can see, there are two Account Types: Reader and Full. Full means that the intended consumer is already a Snowflake customer and has an existing account. When Full consumer accounts incur compute or storage costs, they pay for it directly to Snowflake since they are already a customer. When Reader accounts incur those costs, the data provider pays for them. In this case, the intended consumer is not a Snowflake customer, so click the Create a Reader account link:
To create a Reader Account, insert the name (READERACCOUNT) and a comment (optional), along with credentials for the ACCOUNTADMIN user (READER1). Then click Create Account:
Going back to a worksheet and executing the command SHOW MANAGED ACCOUNTS will show all details needed:
Going back to the previous Add Consumers menu, you can now see that the new Reader Account has been added as a Consumer of the Secure Share. Clicking the blue link with the account locator (XB28199) will also take you the login for the Reader Account:
Enter the credentials for the ACCOUNTADMIN user:
Once you are in your Reader Account, you will need to do some setup. First, ensure that your role is ACCOUNTADMIN (it will default to SYSADMIN). Then click the Warehouses tab and create a warehouse to use (DEMO_WH). You will need compute resources to run queries, and as a Reader Account, your data provider will incur the costs of those compute resources:
Click the Shares account, making sure that you are toggled to Inbound to find the Secure Share that has been shared with you. We see the SHAREDEMO has been shared to us by the INTERWORKS account we were using, which is the data provider in this case. Click Create Database From Secure Share:
In the menu, name your database accordingly and grant access to the desired roles. You do not have to keep the database the same name as what was given by the provider. You can grant access to the database to multiple roles. Click Create Database:
Navigate to Worksheets and enjoy! The shared database (SHARE_FAA) now has read-only tables you can freely query using the SYSADMIN role. As the consumer, you should be able to read from it using a reporting tool like Tableau or an ETL tool like Matillion or Alteryx. If you’re a Full Consumer who is already a Snowflake customer, you can query this data along with your existing data using your own credits for compute, while not paying any storage costs for the data being shared:
Frustrated that the database tables are read-only? No problem. As a consumer, you can create another database and tables by selecting from those shared tables. In this case, a table that has 148 million records only took 27 seconds to move over while using a medium-sized warehouse. Now you have a database you can read from and write to as needed. Unfortunately, you cannot use Snowflake’s awesome clone feature on a shared database.
Please note that the computing costs to execute this query and the storage costs to store the 5.7GB are incurred by the data provider since the Reader Account is not a Snowflake customer. However, the provider can easily invoice the consumers according to any business agreements set between them:
There you have it! Best of luck testing out Snowflake Secure Data Sharing. I’ve also included some frequently asked questions about this process below that may prove helpful as you’re working through it. However, if you have any further questions or some not addressed here, please do not hesitate to reach out to myself or the InterWorks team! We’d love to help.
Frequently Asked Questions
What is difference between sharing data with existing Snowflake customers versus non-Snowflake customers?
If you establish a share with an existing Snowflake customer, they can see that shared database in their existing Snowflake account. In this case, they would bear the costs of any storage or compute incurred from querying or copying the shared database. For a non-Snowflake customer, the provider would create Reader Accounts and pay for all the costs incurred by those accounts. As the provider, you would be able to track account usage and bill the consumer if that was part of the set business agreement.
How fast would the data update for consumers?
Instantaneously! Keep in mind that the data doesn’t “update” in the traditional sense because the data that the provider sees and the data the consumer sees via the shared database is the same data. The Secure Share is a metadata wrapper that points to the correct data that is still sitting with the provider. So any changes to the provider’s dataset would be reflected instantly from the consumer’s viewpoint.
Once the consumers can see the data in their account, what can they do with it?
They can do whatever they would be able to do with any other read-only database. They can take the data and query it with other data in their Snowflake account. They can read from the data using a reporting tool like Tableau. They can extract the data using an ETL tool and land it in their own database. And finally, they can copy that data into another database, so they would have write access as well to manipulate the data according to their needs.
Can I allow my users to see only selected tables or views?
Yes! Snowflake has a role-based security framework, so the Account Administrator can limit which roles are accessible to certain users. Roles are simply a collection of permissions granted on objects. Therefore, what a user can see really depends on what permissions have been granted to the user’s role. As the data provider, you will want to maintain access as Account Administrator of the Reader Accounts. This will also allow you to monitor account usage.
What if my consumers go overboard with how much they are using the Reader Account?
As the data provider who creates Reader Accounts for non-Snowflake customers and is responsible for any incurred costs, it is imperative that you can control usage. You can easily do this by setting up Resource Monitors, which impose limits on the number of credits that virtual warehouses use within a specified interval or date range. When these limits are reached or are approaching, the Resource Monitor can trigger alerts to suspension of the virtual warehouse altogether.