This blog post is Human-Centered Content: Written by humans for humans.
Snowflake recently started supporting a whole range of new private connectivity functionality when interacting with Azure, and as a security-conscious architect I really couldn’t be happier.
We’ve already seen private connectivity support between Azure and Snowflake in other ways, such as the following:
- Configure Azure Private Link Connectivity with Snowflake
- Configure Azure Private Endpoints for Snowflake Internal Stages
- Snowflake Network Rules: Restrict Access to Specific Private Endpoints
However, these all relate to authentication and none of them specifically targeted native data ingress/egress between Snowflake and Azure storage. This is where the new private connectivity support for connecting to storage containers and queues comes in.
Gone are the days of needing to whitelist the entire subnets for your Snowflake region. Gone are the days of blindly trusting that the traffic is secure simply because Azure have stated that traffic between Azure-hosted objects will always go via the Azure backbone. That’s not to say that these two things aren’t both valuable of course, and if you don’t have strict enough security requirements to demand Snowflake’s Business Critical edition then the links above are certainly sufficient for most traffic. But in the worlds of public health, banking or any other highly-secure environments, the above links aren’t always sufficient to make security specialists do the proverbial jump for joy.
So how do we make use of this fantastic new functionality? How do we ensure that any traffic between our Azure-hosted Snowflake account and our Azure storage containers/queues is secured using private connectivity? And what does this mean for network-level whitelisting? Keep reading if you’d like to know!
Requirements
Before anything is configured, the following requirements must be met:
- Your Snowflake account must be using the Business Critical edition or higher.
- You must have ACCOUNTADMIN access in Snowflake.
- You must have sufficient access to the Azure storage account to be able to approve new private endpoints.
Please note that at this time, this functionality is in PUBLIC PREVIEW and Snowflake’s official recommendation is that any PUBLIC PREVIEW features should not be relied on as Production-grade objects.
Also, at time of writing, each Snowflake account is limited to only five outbound private endpoints by default. This number can be increased by contacting Snowflake Support.
Configuration
If you meet the requirements above, then you’re good to go. The rough steps we will take are as follows:
- Find and note down the resource ID for the Azure storage account
- Configure private connectivity for any storage integrations
- Provision a private endpoint for traffic between Snowflake and any blobs within the Azure storage account
- Approve the new private endpoint in the Azure storage account
- Update any existing storage integrations to leverage the privatelink endpoint
- Configure private connectivity for any notification integrations
- Provision a private endpoint for traffic between Snowflake and any queues within the Azure storage account
- Approve the new private endpoint in the Azure storage account
- Recreate any existing notification integrations to leverage the privatelink endpoint
- Recreate any objects that leverage the notification integrations that have been recreated in the previous step
- OPTIONAL: Remove any subnet-level whitelisting you previously applied that allowed Snowflake to access your storage account without private connectivity
1. Find and Note Down the Resource ID for the Azure Storage Account
This step is easy as there are multiple ways to track down the resource ID for a storage account in Azure. My personal favourite is to navigate to the storage account itself within the Azure Portal and select “JSON View” in the upper right corner:
At the top of the “JSON View” will be an easily copied resource ID:
For our example, we’ll use the following storage account and resource ID:
Storage account: mystorageaccount Resource ID: /subscriptions/a123bc45-67de-89f0-1234-ab5678cd9ef0/resourceGroups/rg-resource-group/providers/Microsoft.Storage/storageAccounts/mystorageaccount
2. Configure Private Connectivity for any Storage Integrations
Whilst it is possible to create an external stage that connects to an Azure storage container without a storage integration, I would highly advise against it. In my opinion, storage integration objects should always be used when configuring external stages as this avoids the need to note down any authentication information. In general, many Azure administrators have started removing the capability to configure these kinds of authentication methods at all, and prefer the route taken by the storage integration method.
If desired, you can learn more about storage integrations between Snowflake and Azure here: Configuring Storage Integrations Between Snowflake and Azure Storage
To configure private connectivity for any storage integrations, we complete the following steps:
- Provision a private endpoint for traffic between Snowflake and any blobs within the Azure storage account
- Approve the new private endpoint in the Azure storage account
- Update any existing storage integrations to leverage the privatelink endpoint
2a. Provision a private endpoint for traffic between Snowflake and any blobs within the Azure storage account
To complete the first step, we must execute the following command using the ACCOUNTADMIN role in Snowflake:
select SYSTEM$PROVISION_PRIVATELINK_ENDPOINT( '<Resource ID for storage account>', -- This is the resource ID for the storage account '<Storage account name>.blob.core.windows.net', -- This is the host name for blob storage within the storage account 'blob' -- This is the subresource for blob storage within the storage account ) ;
From step 1, we have already retrieved the following values:
select SYSTEM$PROVISION_PRIVATELINK_ENDPOINT( '/subscriptions/a123bc45-67de-89f0-1234-ab5678cd9ef0/resourceGroups/rg-resource-group/providers/Microsoft.Storage/storageAccounts/mystorageaccount', -- This is the resource ID for the storage account 'mystorageaccount.blob.core.windows.net', -- This is the host name for blob storage within the storage account 'blob' -- This is the subresource for blob storage within the storage account ) ;
Executing the code in Snowflake will give a response similar to the following:
2b. Approve the new private endpoint in the Azure storage account
After completing the previous step, Snowflake will have provisioned a private endpoint within its own infrastructure, and this will be pending approval within the target storage account. Within the Azure Portal, navigate to the “Networking” pane and select the “Private endpoint connections” tab. You will now see a pending entry for the new private endpoint, which you can approve:
When approving the connection you are given the option of populating a description. I would highly recommend populating this value with information that indicates which Snowflake account the connection is coming from, and what subresource it is accessing. For example, you could enter the following description:
To confirm that everything is working, execute the following command in Snowflake using the ACCOUNTADMIN role and review the “STATUS” field of the output:
select parse_json("VALUE"):"host"::string as "HOST" , parse_json("VALUE"):"status"::string as "STATUS" , parse_json("VALUE"):"subresource"::string as "SUBRESOURCE" , parse_json("VALUE"):"endpoint_state"::string as "ENDPOINT_STATE" , parse_json("VALUE"):"provider_resource_id"::string as "PROVIDER_RESOURCE_ID" , parse_json("VALUE"):"snowflake_resource_id"::string as "SNOWFLAKE_RESOURCE_ID" from table(flatten( input => parse_json(system$get_privatelink_endpoints_info()) )) ;
This will return a table similar to the following:
We can see that the status is “Approved” so our endpoint should be good to go!
2c. Update any existing storage integrations to leverage the privatelink endpoint
This final step is simple for storage integrations. Simply enable the “use_privatelink_endpoint” option for the storage integration.
If you have not followed the previous steps to provision and approve the private endpoint(s) between your Snowflake account and the blob subresource(s) of the storage account(s) that you have listed under the “storage_allowed_locations” option, then enabling the “use_privatelink_endpoint” option for your storage integration will break it, since it will not be able to access the underlying storage containers without a private endpoint.
To create a new storage integration, execute a command in Snowflake similar to the following using the ACCOUNTADMIN role:
create storage integration if not exists "MY_STORAGE_INTEGRATION" type = EXTERNAL_STAGE storage_provider = 'AZURE' enabled = TRUE azure_tenant_id = 'a123bc45-67de-89f0-1234-ab5678cd9ef0' storage_allowed_locations = ( 'azure://mystorageaccount.blob.core.windows.net/my-container-1/' , 'azure://mystorageaccount.blob.core.windows.net/my-container-2/' ) use_privatelink_endpoint = TRUE comment = 'Storage integration that leverages a private endpoint to connect to containers within the storage account mystorageaccount.' ;
To modify an existing storage integration, execute a command in Snowflake similar to the following using the ACCOUNTADMIN role:
alter storage integration "MY_STORAGE_INTEGRATION" set use_privatelink_endpoint = TRUE ;
Remember to apply any desired Role-Based Access Control steps here to allow roles other than ACCOUNTADMIN to leverage the storage integration. For example, you may wish to grant USAGE on the storage integration to the role SYSADMIN.
3. Configure Private Connectivity for any Notification Integrations
Contrasting with storage integrations which are option (though still strongly advised), a notification integration is strictly necessary if you wish to leverage Snowflake’s native functionality for automated data ingestion from Azure storage. This includes performing the following actions when new data lands in the underlying storage account:
- Automatically ingesting new data into a table using Snowpipe
- Automatically updating directory tables on stages
- Automatically updating external tables
If desired, you can learn more about notification integration and automated ingestion between Snowflake and Azure here: Automated Ingestion from Azure Storage into Snowflake via Snowpipe
To configure private connectivity for any notification integrations, we complete the following steps:
- Provision a private endpoint for traffic between Snowflake and any queues within the Azure storage account
- Approve the new private endpoint in the Azure storage account
- Recreate any existing notification integrations to leverage the privatelink endpoint
- Recreate any objects that leverage the notification integrations that have been recreated in the previous step
3a. Provision a private endpoint for traffic between Snowflake and any queues within the Azure storage account
To complete the first step, we must execute the following command using the ACCOUNTADMIN role in Snowflake:
select SYSTEM$PROVISION_PRIVATELINK_ENDPOINT( '<Resource ID for storage account>', -- This is the resource ID for the storage account '<Storage account name>.queue.core.windows.net', -- This is the host name for queues within the storage account 'queue' -- This is the subresource for queues within the storage account ) ;
From step 1, we have already retrieved the following values:
Storage account: mystorageaccount Resource ID: /subscriptions/a123bc45-67de-89f0-1234-ab5678cd9ef0/resourceGroups/rg-resource-group/providers/Microsoft.Storage/storageAccounts/mystorageaccount
These values can be entered directly into the SQL statement to provision the private endpoint:
select SYSTEM$PROVISION_PRIVATELINK_ENDPOINT( '/subscriptions/a123bc45-67de-89f0-1234-ab5678cd9ef0/resourceGroups/rg-resource-group/providers/Microsoft.Storage/storageAccounts/mystorageaccount', -- This is the resource ID for the storage account 'mystorageaccount.queue.core.windows.net', -- This is the host name for queues within the storage account 'queue' -- This is the subresource for queues within the storage account ) ;
Executing the code in Snowflake will give a response similar to the following:
3b. Approve the new private endpoint in the Azure storage account
After completing the previous step, Snowflake will have provisioned a private endpoint within its own infrastructure, and this will be pending approval within the target storage account. Within the Azure Portal, navigate to the “Networking” pane and select the “Private endpoint connections” tab. You will now see a pending entry for the new private endpoint, which you can approve.
It’s possible you will also see the connection you previously configured for the storage integration.
When approving the connection you are given the option of populating a description. I would highly recommend populating this value with information that indicates which Snowflake account the connection is coming from, and what subresource it is accessing. For example, you could enter the following description:
To confirm that everything is working, execute the following command in Snowflake using the ACCOUNTADMIN role and review the “STATUS” field of the output:
select parse_json("VALUE"):"host"::string as "HOST" , parse_json("VALUE"):"status"::string as "STATUS" , parse_json("VALUE"):"subresource"::string as "SUBRESOURCE" , parse_json("VALUE"):"endpoint_state"::string as "ENDPOINT_STATE" , parse_json("VALUE"):"provider_resource_id"::string as "PROVIDER_RESOURCE_ID" , parse_json("VALUE"):"snowflake_resource_id"::string as "SNOWFLAKE_RESOURCE_ID" from table(flatten( input => parse_json(system$get_privatelink_endpoints_info()) )) ;
This will return a table similar to the following, which also includes the previously-configured endpoint for the example storage integration:
We can see that the status is “Approved” so our endpoint should be good to go!
3c. Recreate any existing notification integrations to leverage the privatelink endpoint
Unlike for storage integrations, this step can be more complex for notification integrations. Again, the intent is to enable the “use_privatelink_endpoint” option for the notification integration. However, with existing notification integrations it is not sufficient to run an ALTER command (at time of writing). Instead, we must recreate existing notification integrations, which can break downstream objects.
Since this involves recreating existing notification integrations instead of just altering them, there will be several downstream impacts. Be sure you are confident with managing the downstream repercussions before recreating your notification integration.
Our testing has also shown that in some situations, the underlying queue in Azure itself needs to be deleted and recreated. This seems to be necessary when a previous notification integration existed that leveraged the same queue.
To mitigate this risk, I would recommend taking the following steps:
- Execute a “show grants on integration …” command to determine any Role-Based Access Control grants that will need to be added again after the notification integration is recreated
- Determine any pipes, directory tables, external tables, etc. that leverage the notification integration, as these may need to be recreated. Unfortunately, at time of writing these do not appear to be included in the output of the object dependencies view, so your approach to determine these objects will be more complex. One option is to run SHOW commands for all pipes, external tables and directory tables in your account and then parse the DDL of each to determine the notification integration.
If you have not followed the previous steps to provision and approve the private endpoint(s) between your Snowflake account and the blob subresource(s) of the notification account(s) that you have entered under the “azure_storage_queue_primary_uri” option, then enabling the “use_privatelink_endpoint” option for your notification integration break the it, since it will not be able to access the underlying queue without a private endpoint.
To recreate a notification integration so that it leverages private connectivity, execute a command in Snowflake similar to the following using the ACCOUNTADMIN role:
create or replace notification integration "MY_NOTIFICATION_INTEGRATION" type = QUEUE notification_provider = 'AZURE_STORAGE_QUEUE' enabled = TRUE azure_tenant_id = 'a123bc45-67de-89f0-1234-ab5678cd9ef0' azure_storage_queue_primary_uri = 'https://mystorageaccount.queue.core.windows.net/my-queue' use_privatelink_endpoint = TRUE comment = 'Notification integration that leverages a private endpoint to connect to the queue "my-queue" within the storage account mystorageaccount.' ;
Now that the notification integration has been created, be sure to restore any downstream RBAC and objects!
4. OPTIONAL: Remove any Subnet-Level Whitelisting you Previously Applied that Allowed Snowflake to Access your Storage Account Without Private Connectivity
If you have previously configured your storage account to block public network access and still allowed Snowflake to connect to it, then you likely followed Snowflake’s steps to whitelist the entire subnets for your Snowflake region. Now that you have configured private link connectivity between Snowflake and your storage account, you no longer need this!
Within the Azure Portal, navigate to the “Networking” pane and view the “Firewalls and virtual networks” tab. You may see an entry here for the “deployment-infra-rg-vnet,” allowing the subnets for your Snowflake region. To remove the whitelisting, select the three-dot menu on the far right and select “Remove.”
Summary
So there we have it. We have deployed a clearly secure mechanism to allow Snowflake to interact with both containers and queues within our storage account, all through private connectivity.
If you’ve just finished this and you’re interested in taking further steps to secure your Snowflake account, be sure to check out these two articles: