Snowflake External Access: Trigger Tableau Datasource Refreshes

Data

Snowflake External Access: Trigger Tableau Datasource Refreshes

Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure tunnels for Snowflake to communicate with the outside world, and can therefore communicate with external APIs to complete all manner of tasks. For example – external access could be used for any of the following, among many others:

This specific article focuses on how to communicate with the REST API  for a Tableau Server/Cloud deployment to refresh extracts in a published datasource/workbook, however this series will include the other examples above as well. With each new article, the entry above will be converted to a link.

How Does External Access in Snowflake Work?

Before we dive into the specific example, let’s cover the architecture involved for external access in Snowflake. There are four main components:

  1. A network rule that controls where traffic is allowed to travel to..
  2. An external access integration that allows Snowflake objects to leverage the network rule
  3. An optional security component to facilitate authentication. This usually leverages a Snowflake Secret object that can either securely store information such as a username and password, or make use of a related Security Integration object that facilitates OAuth authentication with an API endpoint.
  4. A Snowflake object such as a stored procedure or a user defined function (UDF) that leverages the external access integration.

This is demonstrated in the following diagram:

Snowflake - External Access - Message to API.png

How to Configure an Outbound Network Rule

In short, network rules are database objects that store information on a type of network traffic for Snowflake. This traffic can be one of two categories:

  • Ingress – Traffic from outside of Snowflake that is coming in to Snowflake
  • Egress – Traffic that is leaving Snowflake and travelling outside

This article specifically focuses on the egress category, and how we can leverage this to enable Snowflake to communicate with endpoints outside of Snowflake’s network. Note that in Snowflake’s terminology, the category (ingress/egress) is referred to as the “mode,” and this terminology will be used for the rest of this article.

Example Network Rule for External Access

Here is a simple example for creating an outbound/egress network rule:

create network rule "MY_OUTBOUND_NETWORK_RULE"
  type = HOST_PORT -- Always use this value for external access integrations
  mode = EGRESS -- Always use this value for external access integrations
  value_list = ('?????') -- Populate with host(s) of the target endpoint(s), such as matillion.customer.com, host.online.tableau.com, customer.webhook.office.com, eu1.api.matillion.com, etc
  comment = 'Allows traffic to be sent to the host(s) in the value_list'
;
This template can be used to configure the network rule for any external access.

It is important to note that Snowflake stores network rules as database objects, meaning they sit inside schemas in databases. This is different to network policies, which are account-level objects. This is easy to get confused, which is why it has been mentioned here.

Specific Network Rule Example

The following network rule could be leveraged to enable outbound/egress traffic with Tableau Server/Cloud API:

create network rule "EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__TABLEAU_API"
  type = HOST_PORT
  mode = EGRESS
  value_list = ('?????') -- Populate with host of the target Tableau Server/Cloud instance, such as tableau.company.com for a server or 10ax.online.tableau.com for Tableau Cloud
  comment = 'Allows traffic to be sent to the API for the Tableau Server deployment'
;

How To Set up an External Access Integration

In short, external access integrations are account-level objects that act as the bridge between egress network rules and any other Snowflake objects (stored procedures/UDFs).

Here is a simple example for creating an external access integration that leverages an outbound/egress network rule:

create external access integration "MY_EXTERNAL_ACCESS_INTEGRATION"
  allowed_network_rules = ("MY_DATABASE"."MY_SCHEMA"."MY_OUTBOUND_NETWORK_RULE")
  enabled = FALSE
  comment = 'External access integration to support traffic sent to the <endpoint(s) enabled by the network rule(s)>'
;

How To Attach Optional Security Components – Secrets for a Tableau API

Depending on your required target, you may have a need for additional security components to authenticate your requests with the destination. For example, the destination may expect authentication via a username/password combination that has been converted into a bearer token, or it may leverage OAuth and expect a combination of client ID and secret ID. Of course, these are just two examples and there are other possibilities for more complex use cases.

In the case of a Tableau API, a peronal access token (PAT) is the simplest way to authenticate. To support this, two secrets are required in Snowflake, one to store the name of the peronal access token and one to store the value. This is required as PAT values take a form that is not accepted by a standard password-type secret in Snowflake, and thus a pair of generic-string-type secrets are used instead.

My personal recommendation is to create your external access integration before creating any additional security components, then apply them using alter commands.

To start, create the external access integration, using the network rule that was defined in the specific example earlier in this article. As the integration is still a work in progress, set “enabled” to FALSE.

create external access integration "EAI__TABLEAU_API"
  allowed_network_rules = ("EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__TABLEAU_API")
  enabled = FALSE
  comment = 'External access integration to support traffic sent to the API for the Tableau Server deployment'
;

Required Permissions in Tableau Server/Cloud

Before creating the personal access token (PAT) as detailed in the next step, a service principal user should be created in Tableau Server/Cloud with the appropriate access to any target published workbooks/datasources. Specifically, for any user to be able to refresh a data extract in a published workbook/datasource in Tableau Server/Cloud, that user must have one of the following levels of access:

  • Ownership of the published workbook/datasource itself.
  • Project Leader access for the project that contains the published workbook/datasource.
  • Site Administrator access, which grants full control over the entire Tableau Server/Cloud site.

How To Create a Personal Access Token (PAT) in Tableau Server/Cloud

Creating a PAT in Tableau Server/Cloud can be achieved in a handful of clicks:

  1. After logging in to your Tableau Server/Cloud site, select your profile icon in the upper right corner and select “My Account Settings.”
  2. Scroll down to the “Personal Access Tokens” area, enter a name for your new token and select “Create new token:”image.png
  3. A pop-up will appear allowing you to copy the new access token to your clipboard. This is only available once, so be sure to leverage the value straight away, either by inserting it directly into the destination by creating the secret in Snowflake, or by noting it down temporarily in another secure location whilst you work:
    image.png
  4. When complete, your token name should then be listed in the “Personal Access Tokens” area. This can be revoked if required:image.png

How to Configure Secrets To Contain a Generic Strings

Now that the personal access token (PAT) has been created, we can place the credentials inside secrets in Snowflake.

-- Secret for token name
create secret "EXTERNAL_ACCESS"."SECRETS"."CREDS__TABLEAU_API__ACCESS_TOKEN_NAME"
  type = GENERIC_STRING
  secret_string = 'my-pat-token'
  comment = 'Name of the Personal Access Token (PAT) for the Tableau Server API'
;

-- Secret for token value
create secret "EXTERNAL_ACCESS"."SECRETS"."CREDS__TABLEAU_API__ACCESS_TOKEN_VALUE"
  type = GENERIC_STRING
  secret_string = '?????' -- Enter the value of the token here
  comment = 'Personal Access Token (PAT) for the Tableau Server API'
;

Attach the Optional Security Components to the External Access Integration

Now that the secrets have been configured, attaching them to the external access integration is achieved with an alter statement. This is also a good opportunity to enable the integration.

alter external access integration "EAI__TABLEAU_API"
  set allowed_authentication_secrets = (
        "EXTERNAL_ACCESS"."SECRETS"."CREDS__TABLEAU_API__ACCESS_TOKEN_NAME"
      , "EXTERNAL_ACCESS"."SECRETS"."CREDS__TABLEAU_API__ACCESS_TOKEN_VALUE"
    )
    , enabled = TRUE
;

How To Trigger a Resource Extract Refresh in Tableau Server/Cloud

Now that we have discussed and configured the underlying architecture behind our process, we can move on to our true goal, which is to create a stored procedure in Snowflake that can automatically trigger an extract refresh for a published workbook/datasource in Tableau Server/Cloud.

Creating a Mapping Table for Resources and Their Backend IDs

Whilst this step is not necessary and you could develop the process without it, leveraging a mapping table can make the process much simpler. The idea here is to create a table that maps combinations of resource type (workbook or datasource), site ID, project name and resource name to backend resource IDs. That way, if you have resources that you wish to refresh under different scenarios, you can have a single stored procedure that determines the target resource based on a set of inputs. The benefit of the mapping table specifically is that the stored procedure can then convert the inputs into a backend resource ID, removing the effort required for the end user to retrieve the resource ID each time.

Here is some example code for creating such a mapping table:

Populating the Mapping Table

Populating the mapping table is relatively simple once you know the process, but can be confusing at first. The main component we need is the resource ID, which is stored in Tableau’s backend and not exposed in any of the friendly parts of the user interface. Instead, the resource must be obtained from the backend.

The backend ID for a resource can change for a number of reasons, so it is vital to keep this table updated.

The following options are available for identifying the backend ID for a resource:

  1. Leverage the GraphiQL interface to directly retrieve the resource based on an ID that is visible from the URL in the UI.
    • This is the option I would recommend for smaller/simpler scenarios where it is not difficult to maintain the table.
  2. Leverage the REST API, either directly or through a library such as the tableauserverclient for Python, to execute a series of queries to identify the resource.
    • I would not recommend this option as you may struggle with duplicates multiple resources exist with the same name in different projects, especially if the projects are children that also have the same name but sit inside different parent projects.
  3. Query the backend postgreSQL repository directly to identify the IDs.
    • Not an option for Tableau Cloud as the postgreSQL repository is not exposed.
    • I would not recommend this option as it is more complex than step option (1).
  4. Develop an ELT process to replicate the backend postgreSQL repository into Snowflake regularly, then remove the mapping table and modify the process to leverage the actual backend tables.
    • Not an option for Tableau Cloud as the postgreSQL repository is not exposed.
    • This is the option I would recommend for larger Tableau Server scenarios as you can ensure you are always using the most recent ID for each resource. It requires more development time for initial setup, but will then automatically update as needed without further manual intervention.
  5. Develop an ELT process to regularly query the GraphiQL backend regularly.
    • This is similar to option (4) but can work on both Tableau Server and Tableau Cloud.
    • This is the option I would recommend for larger Tableau Cloud scenarios.

To keep things simple for this article, only the first option is covered. Fortunately, it is a quick and simple one to use.

Enabling Metadata Services for Tableau Server

Don’t worry, you do not need to have purchased Tableau’s Data Management add-on for this to work, as that add-on enables the frontend support for the metadata services. The backend is already there behind the scenes.

If you are using Tableau Cloud then this step can be skipped as the metadata services will already be enabled.

For Tableau Server, you will need one of your administrators to access the backend of the Tableau Server and execute this command to enable the services:

tsm maintenance metadata-services enable

This will require a restart of your Tableau Server.

Retrieve the Backend ID for a Published Workbook or Datasource Using the GraphiQL Interface

To retrieve the backend ID for a published workbook/datasource, start by navigating to it in the Tableau Server/Cloud user interface and looking at the URL.

Depending on if it is a workbook or a datasource, it should be in one of the following formats:

https://<server_url>/#/site/<site_id>/workbooks/<vizportal_url_id>/views
https://<server_url>/#/site/<site_id>/datasources/<vizportal_url_id>/askData

For example, the URL could be the one of the following:

https://tableau.company.com/#/site/MySite/workbooks/12345/views
https://tableau.company.com/#/site/MySite/datasources/6789/askData

The important detail we are looking for here is that “vizportal_url_id” near the end of the URL. This is the unique identifier for the resource in Tableau’s frontend.

The next step is to use this to retrieve the backend ID  by leveraging the metadata services through the GraphiQL interface. This interface can be accessed using the following URL:

https://<instance_url>/metadata/graphiql/

For example, this URL could be the following:

https://tableau.company.com/metadata/graphiql/

Once you are within this area, the following query can be used to retrieve the backend ID, referred to as the “luid,” for a workbook:

{
  workbooks(filter:{vizportalUrlId: "<viz_portal_url_id>"}) {
    luid
    vizportalUrlId
    name
    projectName
  }
}

Similarly, the following query can be used to retrieve the backend ID, referred to as the “luid”, for a datasource:

{
  publishedDatasources(filter:{vizportalUrlId: "<viz_portal_url_id>"}) {
    luid
    vizportalUrlId
    name
    projectName
  }
}

The following screenshot demonstrates an example of this for a workbook:

image.png

From this output, the “luid” is the backend ID that is required when populating mapping table. This is the unique ID for the resource that is leveraged by the Tableau Server/Cloud API.

Creating a Stored Procedure To Trigger a Resource Refresh

Finally, we come to the main meat of the article. First, I will provide the full code for the stored procedure. I will then explain some key areas.

This article does not explain Python stored procedures in general. If you wish to know more about these, the following article is recommended: A Definitive Guide to Python Stored Procedures in the Snowflake UI. This is part of our Snowflake with Python series.

The full procedure definition is as follows:

-- Create stored procedure
create or replace procedure "EXTERNAL_ACCESS"."PROCEDURES"."API__TABLEAU__REFRESH_RESOURCE"(
      RESOURCE_TYPE varchar
    , RESOURCE_NAME varchar
    , PROJECT_NAME varchar
    , SITE_ID varchar
    , CONNECTION_STRING varchar default '?????' -- Populate with host of the target Tableau Server/Cloud instance, such as tableau.company.com for a server or 10ax.online.tableau.com for Tableau Cloud
  )
  copy grants
  returns varchar
  language python
  runtime_version = 3.10
  handler = 'main'
  external_access_integrations = ("EAI__TABLEAU_API")
  secrets = (
        'pat_name' = "EXTERNAL_ACCESS"."SECRETS"."CREDS__TABLEAU_API__ACCESS_TOKEN_NAME"
      , 'pat_value' = "EXTERNAL_ACCESS"."SECRETS"."CREDS__TABLEAU_API__ACCESS_TOKEN_VALUE"
    )
  packages = ('snowflake-snowpark-python', 'tableauserverclient')
as
$$

## Import module for inbound Snowflake session
from snowflake.snowpark import Session as snowparkSession
from snowflake.snowpark.functions import col, lit

## Imports
import _snowflake
import tableauserverclient as TSC

## Define main function
## Put at top of SProc to avoid Snowflake linting bug
## if converted to a Snowpark for Python worksheet
def main(
      snowpark_session: snowparkSession
    , resource_type: str
    , resource_name: str
    , project_name: str
    , site_id: str
    , connection_string: str
  ):

  ### Validate inputs
  if resource_type is None \
    or len(resource_type) == 0 \
    or resource_type not in ["workbook", "datasource"] \
    or resource_name is None \
    or len(resource_name) == 0 \
    or project_name is None \
    or len(project_name) == 0 \
    or site_id is None \
    or len(site_id) == 0 \
      :
    return "Missing inputs. Must provide all of resource_type, resource_name, project_name, site_id. Also, resource_type must be either 'workbook' or 'datasource'"
  
  ### Retrieve the resource id
  resource_id = retrieve_resource_id(
      snowpark_session = snowpark_session
    , resource_type = resource_type
    , resource_name = resource_name
    , project_name = project_name
    , site_id = site_id
  )
  
  if resource_id is None:
    return "Resource not in mapping table"

  ### Trigger the refresh of the resource
  try:
    trigger_resource_refresh(
        resource_type = resource_type
      , resource_id = resource_id
      , site_id = site_id
      , connection_string = connection_string
    )
    return "Resource refresh complete"
  except Exception as e:
    return f"Error triggering resource refresh:\n\n{str(e)}"

### Define the function to retrieve the resource ID
def retrieve_resource_id(
      snowpark_session: snowparkSession
    , resource_type: str
    , resource_name: str
    , project_name: str
    , site_id: str
  ):

  ### Query the RESOURCES table, filtered by other inputs
  sf_df_resource_id = snowpark_session.table('"EXTERNAL_ACCESS"."MAPPING_TABLES"."API__TABLEAU__RESOURCES"') \
    .filter(
        (col('"RESOURCE_TYPE"') == lit(resource_type))
      & (col('"RESOURCE_NAME"') == lit(resource_name))
      & (col('"PROJECT_NAME"') == lit(project_name))
      & (col('"SITE_ID"') == lit(site_id))
    ) \
    .select(col('"RESOURCE_ID"')) \
    .collect()

  ### If a value is found, return it.
  ### Otherwise, return None
  if sf_df_resource_id is not None \
    and len(sf_df_resource_id) > 0 \
      :
    resource_id = sf_df_resource_id[0]["RESOURCE_ID"]
  else:
    resource_id = None

  return resource_id

### Define the function to trigger the refresh of a resource
def trigger_resource_refresh(
      resource_type: str
    , resource_id: str
    , site_id: str
    , connection_string: str
  ):
  
  ### Retrieve authentication object
  access_token_name = _snowflake.get_generic_secret_string("pat_name")
  access_token_value = _snowflake.get_generic_secret_string("pat_value")
  tableau_auth = TSC.PersonalAccessTokenAuth(
      access_token_name
    , access_token_value
    , site_id = site_id
  )
  
  ### Retrieve Tableau server object
  tableau_server_url = f"https://{connection_string}"
  tableau_server = TSC.Server(tableau_server_url, use_server_version=True)

  ### Execute the refresh statement
  with tableau_server.auth.sign_in(tableau_auth):
    if resource_type == "workbook":
      # Get the workbook by its Id to make sure it exists
      resource = tableau_server.workbooks.get_by_id(resource_id)

      # trigger the refresh, you'll get a job id back which can be used to poll for when the refresh is done
      job_id = tableau_server.workbooks.refresh(resource)
    else:
      # Get the datasource by its Id to make sure it exists
      resource = tableau_server.datasources.get_by_id(resource_id)

      # trigger the refresh, you'll get a job id back which can be used to poll for when the refresh is done
      job_id = tableau_server.datasources.refresh(resource)
    
    # `wait_for_job` will throw if the job isn't executed successfully
    job = tableau_server.jobs.wait_for_job(job_id)

  return

$$
;

-- Test query
call "EXTERNAL_ACCESS"."PROCEDURES"."API__TABLEAU__REFRESH_RESOURCE"(
    RESOURCE_TYPE => 'datasource'
  , RESOURCE_NAME => 'My datasource'
  , PROJECT_NAME => 'My Project'
  , SITE_ID => 'MySite'
)
;

-- Another test query
call "EXTERNAL_ACCESS"."PROCEDURES"."API__TABLEAU__REFRESH_RESOURCE"(
    RESOURCE_TYPE => 'workbook'
  , RESOURCE_NAME => 'My workbook with embedded datasource(s)'
  , PROJECT_NAME => 'My Project'
  , SITE_ID => 'MySite'
)
;

The following lines are important here:

  • Lines 14-18 enable the stored procedure to leverage the external access integration and accompanying secrets that facilitate authentication to Tableau.
  • Line 19 ensures that the “tableauserverclient” Python package is available, which is then imported on line 29 and leveraged throughout the procedure. Fortunately, this is simple to ingest as it is already included in Anaconda’s Snowflake channel, which Snowflake leverages when executing Python.
  • Lines 120-126 leverage the secrets to generate an authentication object for “tableauserverclient.” That authentication object is then leveraged on line 133 to authenticate to the Tableau Server/Cloud.
  • Lines 81-109 define a function that leverages Snowpark for Python to query the mapping table and retrieve the backend resource ID. This is leveraged on lines 57-63.
  • Lines 112-150 define a function triggers the resource refresh
  • Lines 156-161 can be used to test the process for a datasource
  • Lines 165-170 can be used to test the process for a workbook

This stored procedure provides the main structure to achieve our goals, but only uses some basic error handling. Depending on your purposes, it may be worth expanding this when deploying it to your own environment.

Testing the Stored Procedure

A successful test of the stored procedure should yield a simple “Resource refresh complete” response, which has been configured within the stored procedure:

image.png

Once executed, you should see the last refresh timestamp update for the extract in Tableau Server/Cloud:

image.png

Wrap Up

Hopefully the information in this article provides you with all of the tools you need to programatically trigger datasource extract refreshes in Tableau. This stored procedure can now be made available to other users using Snowflake’s Role-Based Access Control (RBAC) and could be useful in multiple scenarios, including:

  • Refreshing extracts at the end of scheduled pipelines
  • Refreshing extracts as the result of a Snowflake Alert
  • Refreshing extracts as part of a process that executes when new data is detected in a stream

KeepWatch by InterWorks

Whether you need support for one platform or many, our technical experts have you covered.

More About the Author

Chris Hastie

Data Lead
Querying Stock Data with an API Query Profile in Matillion ETL Matillion ETL is a GUI-based orchestration and transformation tool for data movement that has many functionalities, including querying ...
Snowflake External Access: Retrieve Data from an API Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!