Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure tunnels for Snowflake to communicate with the outside world, and can therefore communicate with external APIs to complete all manner of tasks. For example, external access could be used for any of the following, among many others:
- Send alerts to Teams webhooks so that the alerts can be presented in a Teams channel
- Send alerts to Slack webhooks so that the alerts can be presented in a Slack channel
- Communicate with a Matillion ETL API to trigger jobs (this post)
- Communicate with the Matillion DPC API to trigger pipelines
- Communicate with a Tableau Server/Cloud API to trigger datasource extract refreshes
- Retrieve data from an API, for example the Snowflake Status API
- Communicate with any other external API for any other reason, the opportunities are limitless!
This specific article focuses on how to communicate with a Matillion ETL API to trigger jobs; however, this series will include the other examples above, as well. With each new article, the entry above will be converted to a link.
How Does External Access in Snowflake Work?
Before we dive into the specific example, let’s cover the architecture involved for external access in Snowflake. There are four main components:
- A network rule that controls where traffic is allowed to travel to
- An external access integration that allows Snowflake objects to leverage the network rule
- An optional security component to facilitate authentication. This usually leverages a Snowflake Secret object that can either securely store information such as a username and password or make use of a related Security Integration object that facilitates OAuth authentication with an API endpoint
- A Snowflake object such as a stored procedure or a user defined function (UDF) that leverages the external access integration
This is demonstrated in the following diagram:
How to Configure an Outbound Network Rule
In short, network rules are database objects that store information on a type of network traffic for Snowflake. This traffic can be one of two categories:
- Ingress – Traffic from outside of Snowflake that is coming in to Snowflake
- Egress – Traffic that is leaving Snowflake and travelling outside
This article specifically focuses on the egress category, and how we can leverage this to enable Snowflake to communicate with endpoints outside of Snowflake’s network. Note that in Snowflake’s terminology, the category (ingress/egress) is referred to as the “mode,” and this terminology will be used for the rest of this article.
Example Network Rule for External Access
Here is a simple example for creating an outbound/egress network rule:
create network rule "MY_OUTBOUND_NETWORK_RULE" type = HOST_PORT -- Always use this value for external access integrations mode = EGRESS -- Always use this value for external access integrations value_list = ('?????') -- Populate with host(s) of the target endpoint(s), such as matillion.customer.com, host.online.tableau.com, customer.webhook.office.com, eu1.api.matillion.com, etc comment = 'Allows traffic to be sent to the host(s) in the value_list' ;
This template can be used to configure the network rule for any external access.
It is important to note that Snowflake stores network rules as database objects, meaning they sit inside schemas in databases. This is different to network policies, which are account-level objects. This is easy to get confused, which is why it has been mentioned here.
Specific Network Rule Example
The following network rule could be leveraged to enable outbound/egress traffic with a Matillion ETL API:
create network rule "EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__MATILLION_ETL_API" type = HOST_PORT mode = EGRESS value_list = ( 'matillion.company.com' -- Populate with host of the target Matillion instance, such as matillion.company.com ) comment = 'Allows traffic to be sent to the API for the Matillion ETL deployment' ;
How to Set Up an External Access Integration
In short, external access integrations are account-level objects that act as the bridge between egress network rules and any other Snowflake objects (stored procedures/UDFs).
Here is a simple example for creating an external access integration that leverages an outbound/egress network rule:
create external access integration "MY_EXTERNAL_ACCESS_INTEGRATION" allowed_network_rules = ("MY_DATABASE"."MY_SCHEMA"."MY_OUTBOUND_NETWORK_RULE") enabled = FALSE comment = 'External access integration to support traffic sent to the <endpoint(s) enabled by the network rule(s)>' ;
How to Attach Optional Security Components: Secrets for a Matillion ETL API
Depending on your required target, you may have a need for additional security components to authenticate your requests with the destination. For example, the destination may expect authentication via a username/password combination that has been converted into a bearer token, or it may leverage OAuth and expect a combination of client ID and secret ID. Of course, these are just two examples and there are other possibilities for more complex use cases.
In the case of a Matillion ETL API, a secret is required to store the username and password of the Matillion service principal that will be leveraged.
My personal recommendation is to create your external access integration before creating any additional security components, then apply them using alter
commands.
To start, create the external access integration, using the network rule that was defined in the specific example earlier in this article. As the integration is still a work in progress, set “enabled” to FALSE
.
create external access integration "EAI__MATILLION_ETL_API" allowed_network_rules = ("EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__MATILLION_ETL_API") enabled = FALSE comment = 'External access integration to support traffic sent to the API for Matillion ETL' ;
How to Create a Matillion ETL Service Principal for API Requests
Creating a service principal in Matillion ETL is as simple as creating a new user and granting it the relevant access.
- First, log into Matillion ETL as an administrator, open the “Admin” tab and select “User Configuration:”
- Select the
+
icon to create a new user and grant it API access. The following example demonstrates a user called “my-api-user” that has been granted API access. It has also been granted full admin access to ensure that the full API functionality can be leveraged:
How to Configure a Secret to Contain a Password
Now that the service principal has been created, we can place the credentials inside a secret in Snowflake.
create secret "EXTERNAL_ACCESS"."SECRETS"."CREDS__MATILLION_ETL_API" type = PASSWORD username = 'my-api-user' password = '<REDACTED>' -- Enter the password here comment = 'Credentials for the Matillion ETL API user' ;
Attach the Optional Security Components to the External Access Integration
Now that the secret has been configured, attaching it to the external access integration is achieved with an alter
statement. This is also a good opportunity to enable the integration.
alter external access integration "EAI__MATILLION_ETL_API" set allowed_authentication_secrets = ("EXTERNAL_ACCESS"."SECRETS"."CREDS__MATILLION_ETL_API") , enabled = TRUE ;
How to Trigger Matillion ETL Jobs in a Stored Procedure
Now that we have discussed and configured the underlying architecture behind our process, we can move on to our true goal, which is to create a stored procedure in Snowflake that can trigger a Matillion ETL job. In fact, we now have the tools we need to query any endpoint in the Matillion ETL API.
The specific endpoint we are interested in for this example is the following:
/v1/group/name/{groupName}/project/name/{projectName}/version/name/{versionName}/job/name/{jobName}/run?environmentName={environmentName}
This endpoint can be leveraged to trigger a job execution, however it requires four important inputs, along with an optional fifth input:
- The name of the project group
- The name of the project within the project group
- The name of the version within the project
- The name of the job within the project version
- (Optional) The name of the environment in which the job should be executed. If none is provided, the default is used.
It is also possible to include values for any scalar or grid variables within your API request.
Locating the Input Variables
Locating the input variables for a job in Matillion ETL is straightforward. If you do not know them already, open the job that you wish to trigger in Matillion ETL and look at the top of the screen where they will all be provided:
Providing Values for Scalar and Grid Variables
The API endpoint for triggering a job in Matillion ETL is a POST endpoint capable of accepting a payload. Here is an example from Matillion’s own documentation:
{ "scalarVariables": { "{variable-name-1}": "{value-1}" , "{variable-name-2}": "{value-2}" } , "gridVariables": { "{grid-variable-name}": [ [ "{column-value-1}" , "{column-value-2}" , "{column-value-3}" ] ] } }
Creating a Stored Procedure to Trigger a Matillion ETL Job
Finally, we come to the main meat of the article. First, I will provide the full code for the stored procedure. I will then explain some key areas.
This article does not explain Python stored procedures in general. If you wish to know more about these, the following article is recommended: A Definitive Guide to Python Stored Procedures in the Snowflake UI. This is part of our Snowflake with Python series.
The full procedure definition is as follows:
-- Create stored procedure create or replace procedure "EXTERNAL_ACCESS"."PROCEDURES"."API__MATILLION_ETL__TRIGGER_JOB"( PROJECT_GROUP_NAME varchar , PROJECT_NAME varchar , JOB_NAME varchar , VERSION_NAME varchar default 'default' , ENVIRONMENT_NAME varchar default NULL , SCALAR_VARIABLES array default NULL , GRID_VARIABLES array default NULL , CONNECTION_STRING varchar default '?????' -- Populate with host of the target Matillion instance, such as matillion.company.com ) copy grants returns variant language python runtime_version = 3.10 handler = 'main' external_access_integrations = ("EAI__MATILLION_ETL_API") secrets = ('cred' = "EXTERNAL_ACCESS"."SECRETS"."CREDS__MATILLION_ETL_API") packages = ('snowflake-snowpark-python','requests', 'furl') as $$ ## Import module for inbound Snowflake session from snowflake.snowpark import Session as snowparkSession ## Imports import _snowflake import requests from requests.auth import HTTPBasicAuth from furl import furl ## Define main function ## Put at top of SProc to avoid Snowflake linting bug ## if converted to a Snowpark for Python worksheet def main( snowpark_session: snowparkSession , project_group_name: str , project_name: str , job_name: str , version_name: str , environment_name: str , scalar_variables: list , grid_variables: list , connection_string: str ): ### Validate inputs if project_group_name is None \ or len(project_group_name) == 0 \ or project_name is None \ or len(project_name) == 0 \ or job_name is None \ or len(job_name) == 0 \ or version_name is None \ or len(version_name) == 0 \ : return "Missing inputs. Must provide all of project_group_name, project_name, job_name, version_name" ### Retrieve authentication token username_password_object = _snowflake.get_username_password("cred") basic_auth_token = HTTPBasicAuth(username_password_object.username, username_password_object.password) ### Generate URL to trigger job api_endpoint_url_unsafe = f"https://{connection_string}/rest/v1/group/name/{project_group_name}/project/name/{project_name}/version/name/{version_name}/job/name/{job_name}/run" ### Make the URL safe (replacing spaces with %20 etc) api_endpoint_url = furl(api_endpoint_url_unsafe) ### Populate the parameters for the API call api_parameters = {} if environment_name is not None \ and len(environment_name) > 0 \ : api_parameters["environmentName"] = environment_name ### Populate the payload for the API call api_payload = {} if scalar_variables is not None \ and len(scalar_variables) > 0 \ : api_payload["scalarVariables"] = scalar_variables if grid_variables is not None \ and len(grid_variables) > 0 \ : api_payload["gridVariables"] = grid_variables ### Populate headers for the API call headers = { "Content-Type": "application/json" } ### Execute API call to trigger job response = requests.request("POST", api_endpoint_url, headers=headers, params=api_parameters, json=api_payload, auth=basic_auth_token) try: if response.status_code == 200 : return response.json() else : return response.json() except Exception: return response $$ ; -- Test query call "EXTERNAL_ACCESS"."PROCEDURES"."API__MATILLION_ETL__TRIGGER_JOB"( PROJECT_GROUP_NAME => 'Example Group' , PROJECT_NAME => 'Example Project' , JOB_NAME => 'Example Job' ) ;
The following lines are important here:
- Lines 17 and 18 enable the stored procedure to leverage the external access integration and accompanying secret that facilitate authentication to the Matillion ETL API.
- Lines 60 and 61 are required to leverage the secret to generate an authentication token for the Matillion ETL API. That access token is then leveraged for authorization as part of the request on line 93. This also leverages the
HTTPBasicAuth
function that is imported on row 29. - Line 64 leverages the input variables to establish the full API endpoint that the request is sent to. This is then made URL-safe on row 67 using the
furl
function that is imported on row 30. This is an important step as names often include whitespace which is not URL-safe. - Lines 70-74 establish the parameters that are used in the API endpoint, consisting of the environment name if it was provided as an input variable.
- Lines 77-85 establish the payload that is sent to the API, consisting of any scalar or grid variable values that may have been provided as input variables.
- Line 93 sends the POST request itself to the Matillion ETL API, combining all of the elements established so far.
- Lines 95-101 return the response from the request, with some minor error handling.
- Lines 107-111 can be used to test the process.
Testing the Stored Procedure
A successful test of the stored procedure should yield a response that includes a message called msg
and an id
, which would be the ID for the execution that has just been triggered:
Once triggered, you should be able to monitor the execution in the task history area of either the Matillion ETL environment or the Matillion Hub. You can also track the status by sending requests to the relevant API endpoint.
Example task history in Matillion ETL:
Example task history in Matillion Hub:
Wrap Up
Hopefully the information in this article provides you with all of the tools you need to establish your own connections from Snowflake to the Matillion ETL API. This stored procedure can now be made available to other users using Snowflake’s Role-Based Access Control (RBAC) and could be useful in multiple scenarios, including:
- Triggering a job as part of a scheduled task
- Triggering a job as part of a wider set of stored procedures
- Triggering a job when a stream detects new data from a snowpipe