Snowflake Masking Policies Demystified

Data

Snowflake Masking Policies Demystified

Organizations are constantly collecting and analyzing vast amounts of sensitive information. Protecting this data has become more important than ever. Snowflake’s masking policies are a powerful tool in keeping sensitive information safe while still allowing easy access for those with the proper credentials. This blog will introduce you to the main concepts of masking policies in Snowflake and provide resources for further learning.

Masking policies are considered a column-level security feature in Snowflake. Masking involves transforming or obfuscating specific fields to prevent unauthorized users from seeing sensitive values while still allowing access to other, less sensitive fields. This allows data engineers to balance ease of access with security considerations.

An example for using column-level masking is for a table that includes social security number (SSN) or credit card number (CCN). We may still want to allow analysts to see the accompanying names, cities and other useful fields, while hiding those sensitive fields, like SSN and CCN.

There are several masking policies that can accomplish this, including dynamic data masking, external tokenization, and tag-based masking.

Dynamic Data Masking (DDM)

Dynamic data is the primary masking policy I have seen most users leveraging. It offers a fine-grained mechanism for concealing sensitive information from unauthorized users while allowing them access to non-sensitive data.

Data Definition

DDM is implemented using SQL queries. Users define masking policies that specify which columns to mask, the type of masking to apply and the conditions under which the masking should occur.

Run at Query Run Time

Masking policies are applied at query runtime. The policy is applied to the column at every location where it occurs, like tables and views. Depending on the masking policy conditions, the SQL execution context and role hierarchy, Snowflake query returns plain-text value, a partially masked value or a fully masked value.

Types of Masking

Snowflake offers various masking techniques:

  • Full masking: Replaces the entire column value with a default mask, such as “Xs” or “***.”
  • Partial masking: Reveals only a portion of the data, such as displaying only the last four digits of a credit card number.
  • Pseudonymized masking: Substitutes the original value with a masked value, for example by leveraging the SHA2 function to create a new value. This allows for obfuscation while keeping the same analytical value, as two identical inputs would result in identical masked outputs.

Role-Based Access Control (RBAC)

DDM seamlessly integrates with Snowflake’s RBAC system, allowing you to assign masking policies to specific roles. This ensures that users see different levels of data obfuscation based on their roles and privileges, enhancing data security.

Dynamic data masking in Snowflake provides a flexible, role-based approach to safeguarding sensitive information without compromising data accessibility for authorized users.

Example

Here is an example of a standard masking policy leveraging RBAC to mask the data in different ways:

create or replace masking policy MY_EMAIL_MASKING_POLICY
as (email_address string) returns string
->
  case

    -- Full access:
    -- Access to read the email address
    -- Applies when the querying user has the "EMAIL_ACCESS" role in their session
    -- Example output: my_email_address@my_domain.com
    when is_role_in_session('EMAIL_ACCESS') 
      then email_address
    
    -- Partial masking:
    -- Access to see the email domain but not the individual
    -- Applies when the querying user has the "DOMAIN" role in their session
    -- Example output: *********@my_domain.com
    when is_role_in_session('DOMAIN_ACCESS') 
      then regexp_replace(email_address,'.+\@','*********@')

    -- Pseudonymised masking:
    -- Cannot see the value whilst maintaining analytical value
    -- Applies when the querying user has the "ANALYST" role in their session
    -- Example output: 1dbd59f661d68b90724f21084396b865497173e4d2714f4d91cf05fa5fc5e18d
    when is_role_in_session('ANALYST')
      then sha2(email_address)

    -- Full masking
    -- Cannot see anything
    -- Example output: *********
      else '*********'
  end
;

This policy can then be assigned to a field in a table directly:

alter table MY_TABLE
  alter column EMAIL_ADDRESS
    set masking policy MY_EMAIL_MASKING_POLICY
;

Tag-Based Masking Policies

By combining masking policies with object tagging in Snowflake, tag-based masking policies offer a more scalable approach to data protection. This is accomplished by classifying objects using tags and applying masking rules based on data classification tags.

Data Classification

Before implementing tag-based masking policies, data must be classified. This can be accomplished by classifying columns and tagging them as you create tables. I would recommend at least adding tags based on sensitivity of data in columns if you plan on using the tag-based approach. For the most usability, it helps to tag objects at every level, including columns, tables and schemas. Snowflake’s object tagging documentation is a helpful resource.

Policy Assignment

Once data is classified, administrators can assign masking policies to specific tags. This means that any column or table with a matching tag will automatically have the associated masking policy applied.

Consistency and Efficiency

Tag-based masking help with consistency and efficiency by allowing you to manage policies programmatically with tags. Instead of manually defining masking rules for each column, you know every column with a certain tag has the same masking policy.

Adaptability

As your data evolves, a tag-based approach allows your masking policies to adapt with it. New columns or tables with the same sensitivity classification automatically have the associated masking policy applied. This reduces administrative overhead by streamlining data protection efforts, making it easier to maintain a secure environment.

Example

Here is an example of assigning a masking policy by assigning a tag:

alter tag MY_EMAIL_TAG set policy MY_EMAIL_MASKING_POLICY;

This tag can then be assigned to a field in a table to apply the masking policy:

alter table MY_TABLE
  alter column EMAIL_ADDRESS
    set tag MY_EMAIL_TAG = 'My tag value'
;

In the above example it does not matter what the value of the tag is, as we have not attempted to leverage it in our example masking policy. If desired, we could choose to leverage the SYSTEM$GET_TAG_ON_CURRENT_COLUMN and/or SYSTEM$GET_TAG_ON_CURRENT_TABLE functions within our masking policy to change the behaviour based on the tag value. For example, the following masking policy will only mask email addresses if the tag value is “MASKED”:

create or replace masking policy MY_TAG_BASED_EMAIL_MASKING_POLICY
as (email_address string) returns string
->
  case
    
    -- Full access:
    -- Access to read the email address
    -- Applies when the tag value is not 'MASKED'
    -- Example output: my_email_address@my_domain.com  
    when SYSTEM$GET_TAG_ON_CURRENT_COLUMN('MY_EMAIL_TAG') != 'MASKED'
      then email_address

    -- Full access:
    -- Access to read the email address
    -- Applies when the querying user has the "EMAIL_ACCESS" role in their session
    -- Example output: my_email_address@my_domain.com
    when is_role_in_session('EMAIL_ACCESS') 
      then email_address
    
    -- Partial masking:
    -- Access to see the email domain but not the individual
    -- Applies when the querying user has the "DOMAIN" role in their session
    -- Example output: *********@my_domain.com
    when is_role_in_session('DOMAIN_ACCESS') 
      then regexp_replace(email_address,'.+\@','*********@')

    -- Pseudonymised masking:
    -- Cannot see the value whilst maintaining analytical value
    -- Applies when the querying user has the "ANALYST" role in their session
    -- Example output: 1dbd59f661d68b90724f21084396b865497173e4d2714f4d91cf05fa5fc5e18d
    when is_role_in_session('ANALYST')
      then sha2(email_address)

    -- Full masking
    -- Cannot see anything
    -- Example output: *********
      else '*********'
  end
;

External Tokenization

External tokenization takes data security to the next level by replacing sensitive information with tokens generated and managed by external functions. Tokens are undecipherable values that replace values in columns with sensitive data. This is the most secure option for obfuscating sensitive data as the real values are not available in any table or view.

Pre-Load Tokenization

External Tokenization allows accounts to tokenize data before loading it into Snowflake. By using a tokenization provider, the data is transformed and pre-loaded into Snowflake. The benefit is that without having masking policy in place, users cannot see the real data value.

De-Tokenization at Runtime

The data is de-tokenized at query runtime so that the appropriate can view original values. This is performed by the external functions set in the masking policy. Like other types of masking, the queries may return plain-text value, a partially masked value or a fully masked value. This depends on the masking policy conditions, the SQL execution context and role hierarchy.

Data Governance and Access Management

Masking policies like external tokenization support contextual data access by role or custom entitlements. It supports both centralized and decentralized administration models. For example, dedicated security or privacy officer can decide which columns to protect instead of the object owner. It also allows you to apply one policy to thousands of columns across databases and schemas. And all of this can easily be changed by updating the masking policy, instead of having to update columns individually.

Enhanced Security and Compliance

External tokenization helps organizations comply with strict data protection rules and regulations, as it minimizes the risk of data breaches while preserving the functionality of the data for authorized users.

Example

Here is an example of a masking policy that leverages an external function to detokenize data that is secured with external tokenization:

create or replace masking policy MY_TOKENIZATION_MASKING_POLICY
as (input_value string) returns string
->
  case
    
    -- Full access:
    -- Execute the external function "DETOKENISE_VALUE"
    -- which detokenizes the data.
    -- Applies when the querying user has the "SECURE_ACCESS" role in their session
    when is_role_in_session('SECURE_ACCESS') 
      then DETOKENISE_VALUE(input_value)
    
    -- No access
    -- Return the original input value,
    -- which is still tokenized
      else input_value
  end
;

Encryption

Encryption allows users to safeguard their sensitive values by using a passphrase to encrypt/decrypt values. ENCRYPT is used during data ingestion to load the encrypted into Snowflake. DECRYPT can then be used as a function within a masking policy to decrypt values using the varchar passcode.

There are also ENCRYPT_RAW and DECRYPT_RAW variations of ENCRYPT and DECRYPT that use binary keys and initialized vectors instead of varchar passphrases.

Below is an example of a masking policy that leverages DECRYPT:

create or replace masking policy MY_DECRYPTION_MASKING_POLICY
as (input_value string) returns string
->
  case
    
    -- Full access:
    -- Execute the DECRYPT function
    -- which decrypts the data.
    -- Applies when the querying user has the "SECURE_ACCESS" role in their session
    when is_role_in_session('SECURE_ACCESS') 
      then DECRYPT(input_value, $passphrase)
    
    -- No access
    -- Return the original input value,
    -- which is still encrypted
      else input_value
  end
;

Additional Resources

I hope this has been a helpful introduction to Snowflake’s masking policies. For deeper dives into each of the topics we covered, I have attached links to some helpful resources:

More About the Author

Austin Schwinn

Data Engineer
Snowflake Masking Policies Demystified Organizations are constantly collecting and analyzing vast amounts of sensitive information. Protecting this data has become more ...
Installing dbt with Mac M1 Chips Have you encountered errors trying to install older versions of dbt on your fancy, new MacBook? This could be related to compatibility ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!