Organizations are constantly collecting and analyzing vast amounts of sensitive information. Protecting this data has become more important than ever. Snowflake’s masking policies are a powerful tool in keeping sensitive information safe while still allowing easy access for those with the proper credentials. This blog will introduce you to the main concepts of masking policies in Snowflake and provide resources for further learning.
Masking policies are considered a column-level security feature in Snowflake. Masking involves transforming or obfuscating specific fields to prevent unauthorized users from seeing sensitive values while still allowing access to other, less sensitive fields. This allows data engineers to balance ease of access with security considerations.
An example for using column-level masking is for a table that includes social security number (SSN) or credit card number (CCN). We may still want to allow analysts to see the accompanying names, cities and other useful fields, while hiding those sensitive fields, like SSN and CCN.
There are several masking policies that can accomplish this, including dynamic data masking, external tokenization, and tag-based masking.
Dynamic Data Masking (DDM)
Dynamic data is the primary masking policy I have seen most users leveraging. It offers a fine-grained mechanism for concealing sensitive information from unauthorized users while allowing them access to non-sensitive data.
Data Definition
DDM is implemented using SQL queries. Users define masking policies that specify which columns to mask, the type of masking to apply and the conditions under which the masking should occur.
Run at Query Run Time
Masking policies are applied at query runtime. The policy is applied to the column at every location where it occurs, like tables and views. Depending on the masking policy conditions, the SQL execution context and role hierarchy, Snowflake query returns plain-text value, a partially masked value or a fully masked value.
Types of Masking
Snowflake offers various masking techniques:
- Full masking: Replaces the entire column value with a default mask, such as “Xs” or “***.”
- Partial masking: Reveals only a portion of the data, such as displaying only the last four digits of a credit card number.
- Pseudonymized masking: Substitutes the original value with a masked value, for example by leveraging the SHA2 function to create a new value. This allows for obfuscation while keeping the same analytical value, as two identical inputs would result in identical masked outputs.
Role-Based Access Control (RBAC)
DDM seamlessly integrates with Snowflake’s RBAC system, allowing you to assign masking policies to specific roles. This ensures that users see different levels of data obfuscation based on their roles and privileges, enhancing data security.
Dynamic data masking in Snowflake provides a flexible, role-based approach to safeguarding sensitive information without compromising data accessibility for authorized users.
Example
Here is an example of a standard masking policy leveraging RBAC to mask the data in different ways:
create or replace masking policy MY_EMAIL_MASKING_POLICY as (email_address string) returns string -> case -- Full access: -- Access to read the email address -- Applies when the querying user has the "EMAIL_ACCESS" role in their session -- Example output: my_email_address@my_domain.com when is_role_in_session('EMAIL_ACCESS') then email_address -- Partial masking: -- Access to see the email domain but not the individual -- Applies when the querying user has the "DOMAIN" role in their session -- Example output: *********@my_domain.com when is_role_in_session('DOMAIN_ACCESS') then regexp_replace(email_address,'.+\@','*********@') -- Pseudonymised masking: -- Cannot see the value whilst maintaining analytical value -- Applies when the querying user has the "ANALYST" role in their session -- Example output: 1dbd59f661d68b90724f21084396b865497173e4d2714f4d91cf05fa5fc5e18d when is_role_in_session('ANALYST') then sha2(email_address) -- Full masking -- Cannot see anything -- Example output: ********* else '*********' end ;
This policy can then be assigned to a field in a table directly:
alter table MY_TABLE alter column EMAIL_ADDRESS set masking policy MY_EMAIL_MASKING_POLICY ;
Tag-Based Masking Policies
By combining masking policies with object tagging in Snowflake, tag-based masking policies offer a more scalable approach to data protection. This is accomplished by classifying objects using tags and applying masking rules based on data classification tags.
Data Classification
Before implementing tag-based masking policies, data must be classified. This can be accomplished by classifying columns and tagging them as you create tables. I would recommend at least adding tags based on sensitivity of data in columns if you plan on using the tag-based approach. For the most usability, it helps to tag objects at every level, including columns, tables and schemas. Snowflake’s object tagging documentation is a helpful resource.
Policy Assignment
Once data is classified, administrators can assign masking policies to specific tags. This means that any column or table with a matching tag will automatically have the associated masking policy applied.
Consistency and Efficiency
Tag-based masking help with consistency and efficiency by allowing you to manage policies programmatically with tags. Instead of manually defining masking rules for each column, you know every column with a certain tag has the same masking policy.
Adaptability
As your data evolves, a tag-based approach allows your masking policies to adapt with it. New columns or tables with the same sensitivity classification automatically have the associated masking policy applied. This reduces administrative overhead by streamlining data protection efforts, making it easier to maintain a secure environment.
Example
Here is an example of assigning a masking policy by assigning a tag:
alter tag MY_EMAIL_TAG set policy MY_EMAIL_MASKING_POLICY;
This tag can then be assigned to a field in a table to apply the masking policy:
alter table MY_TABLE alter column EMAIL_ADDRESS set tag MY_EMAIL_TAG = 'My tag value' ;
In the above example it does not matter what the value of the tag is, as we have not attempted to leverage it in our example masking policy. If desired, we could choose to leverage the SYSTEM$GET_TAG_ON_CURRENT_COLUMN and/or SYSTEM$GET_TAG_ON_CURRENT_TABLE functions within our masking policy to change the behaviour based on the tag value. For example, the following masking policy will only mask email addresses if the tag value is “MASKED”:
create or replace masking policy MY_TAG_BASED_EMAIL_MASKING_POLICY as (email_address string) returns string -> case -- Full access: -- Access to read the email address -- Applies when the tag value is not 'MASKED' -- Example output: my_email_address@my_domain.com when SYSTEM$GET_TAG_ON_CURRENT_COLUMN('MY_EMAIL_TAG') != 'MASKED' then email_address -- Full access: -- Access to read the email address -- Applies when the querying user has the "EMAIL_ACCESS" role in their session -- Example output: my_email_address@my_domain.com when is_role_in_session('EMAIL_ACCESS') then email_address -- Partial masking: -- Access to see the email domain but not the individual -- Applies when the querying user has the "DOMAIN" role in their session -- Example output: *********@my_domain.com when is_role_in_session('DOMAIN_ACCESS') then regexp_replace(email_address,'.+\@','*********@') -- Pseudonymised masking: -- Cannot see the value whilst maintaining analytical value -- Applies when the querying user has the "ANALYST" role in their session -- Example output: 1dbd59f661d68b90724f21084396b865497173e4d2714f4d91cf05fa5fc5e18d when is_role_in_session('ANALYST') then sha2(email_address) -- Full masking -- Cannot see anything -- Example output: ********* else '*********' end ;
External Tokenization
External tokenization takes data security to the next level by replacing sensitive information with tokens generated and managed by external functions. Tokens are undecipherable values that replace values in columns with sensitive data. This is the most secure option for obfuscating sensitive data as the real values are not available in any table or view.
Pre-Load Tokenization
External Tokenization allows accounts to tokenize data before loading it into Snowflake. By using a tokenization provider, the data is transformed and pre-loaded into Snowflake. The benefit is that without having masking policy in place, users cannot see the real data value.
De-Tokenization at Runtime
The data is de-tokenized at query runtime so that the appropriate can view original values. This is performed by the external functions set in the masking policy. Like other types of masking, the queries may return plain-text value, a partially masked value or a fully masked value. This depends on the masking policy conditions, the SQL execution context and role hierarchy.
Data Governance and Access Management
Masking policies like external tokenization support contextual data access by role or custom entitlements. It supports both centralized and decentralized administration models. For example, dedicated security or privacy officer can decide which columns to protect instead of the object owner. It also allows you to apply one policy to thousands of columns across databases and schemas. And all of this can easily be changed by updating the masking policy, instead of having to update columns individually.
Enhanced Security and Compliance
External tokenization helps organizations comply with strict data protection rules and regulations, as it minimizes the risk of data breaches while preserving the functionality of the data for authorized users.
Example
Here is an example of a masking policy that leverages an external function to detokenize data that is secured with external tokenization:
create or replace masking policy MY_TOKENIZATION_MASKING_POLICY as (input_value string) returns string -> case -- Full access: -- Execute the external function "DETOKENISE_VALUE" -- which detokenizes the data. -- Applies when the querying user has the "SECURE_ACCESS" role in their session when is_role_in_session('SECURE_ACCESS') then DETOKENISE_VALUE(input_value) -- No access -- Return the original input value, -- which is still tokenized else input_value end ;
Encryption
Encryption allows users to safeguard their sensitive values by using a passphrase to encrypt/decrypt values. ENCRYPT is used during data ingestion to load the encrypted into Snowflake. DECRYPT can then be used as a function within a masking policy to decrypt values using the varchar passcode.
There are also ENCRYPT_RAW and DECRYPT_RAW variations of ENCRYPT and DECRYPT that use binary keys and initialized vectors instead of varchar passphrases.
Below is an example of a masking policy that leverages DECRYPT:
create or replace masking policy MY_DECRYPTION_MASKING_POLICY as (input_value string) returns string -> case -- Full access: -- Execute the DECRYPT function -- which decrypts the data. -- Applies when the querying user has the "SECURE_ACCESS" role in their session when is_role_in_session('SECURE_ACCESS') then DECRYPT(input_value, $passphrase) -- No access -- Return the original input value, -- which is still encrypted else input_value end ;
Additional Resources
I hope this has been a helpful introduction to Snowflake’s masking policies. For deeper dives into each of the topics we covered, I have attached links to some helpful resources:
- Introduction to Column-Level Security
- Tag-Based Masking Policy Documentation
- Dynamic Data Masking Policy Documentation
- External Tokenization Documentation
- How to Create a Masking Policy (Documentation)
- How to Alter a Masking Policy (Documentation)
- How to Drop a Masking Policy (Documentation)
- How to Show Masking Policies (Documentation)
- How to Describe a Masking Policy (Documentation)