Snowflake External Access: Trigger Matillion ETL Jobs

Data

Snowflake External Access: Trigger Matillion ETL Jobs

Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure tunnels for Snowflake to communicate with the outside world, and can therefore communicate with external APIs to complete all manner of tasks. For example, external access could be used for any of the following, among many others:

This specific article focuses on how to communicate with a Matillion ETL API to trigger jobs; however, this series will include the other examples above, as well. With each new article, the entry above will be converted to a link.

How Does External Access in Snowflake Work?

Before we dive into the specific example, let’s cover the architecture involved for external access in Snowflake. There are four main components:

  1. A network rule that controls where traffic is allowed to travel to
  2. An external access integration that allows Snowflake objects to leverage the network rule
  3. An optional security component to facilitate authentication. This usually leverages a Snowflake Secret object that can either securely store information such as a username and password or make use of a related Security Integration object that facilitates OAuth authentication with an API endpoint
  4. A Snowflake object such as a stored procedure or a user defined function (UDF) that leverages the external access integration

This is demonstrated in the following diagram:

How to Configure an Outbound Network Rule

In short, network rules are database objects that store information on a type of network traffic for Snowflake. This traffic can be one of two categories:

  • Ingress – Traffic from outside of Snowflake that is coming in to Snowflake
  • Egress – Traffic that is leaving Snowflake and travelling outside

This article specifically focuses on the egress category, and how we can leverage this to enable Snowflake to communicate with endpoints outside of Snowflake’s network. Note that in Snowflake’s terminology, the category (ingress/egress) is referred to as the “mode,” and this terminology will be used for the rest of this article.

Example Network Rule for External Access

Here is a simple example for creating an outbound/egress network rule:

create network rule "MY_OUTBOUND_NETWORK_RULE"
  type = HOST_PORT -- Always use this value for external access integrations
  mode = EGRESS -- Always use this value for external access integrations
  value_list = ('?????') -- Populate with host(s) of the target endpoint(s), such as matillion.customer.com, host.online.tableau.com, customer.webhook.office.com, eu1.api.matillion.com, etc
  comment = 'Allows traffic to be sent to the host(s) in the value_list'
;

This template can be used to configure the network rule for any external access.

It is important to note that Snowflake stores network rules as database objects, meaning they sit inside schemas in databases. This is different to network policies, which are account-level objects. This is easy to get confused, which is why it has been mentioned here.

Specific Network Rule Example

The following network rule could be leveraged to enable outbound/egress traffic with a Matillion ETL API:

create network rule "EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__MATILLION_ETL_API"
  type = HOST_PORT
  mode = EGRESS
  value_list = (
      'matillion.company.com' -- Populate with host of the target Matillion instance, such as matillion.company.com
  )
  comment = 'Allows traffic to be sent to the API for the Matillion ETL deployment'
;

How to Set Up an External Access Integration

In short, external access integrations are account-level objects that act as the bridge between egress network rules and any other Snowflake objects (stored procedures/UDFs).

Here is a simple example for creating an external access integration that leverages an outbound/egress network rule:

create external access integration "MY_EXTERNAL_ACCESS_INTEGRATION"
  allowed_network_rules = ("MY_DATABASE"."MY_SCHEMA"."MY_OUTBOUND_NETWORK_RULE")
  enabled = FALSE
  comment = 'External access integration to support traffic sent to the <endpoint(s) enabled by the network rule(s)>'
;

How to Attach Optional Security Components: Secrets for a Matillion ETL API

Depending on your required target, you may have a need for additional security components to authenticate your requests with the destination. For example, the destination may expect authentication via a username/password combination that has been converted into a bearer token, or it may leverage OAuth and expect a combination of client ID and secret ID. Of course, these are just two examples and there are other possibilities for more complex use cases.

In the case of a Matillion ETL API, a secret is required to store the username and password of the Matillion service principal that will be leveraged.

My personal recommendation is to create your external access integration before creating any additional security components, then apply them using alter commands.

To start, create the external access integration, using the network rule that was defined in the specific example earlier in this article. As the integration is still a work in progress, set “enabled” to FALSE.

create external access integration "EAI__MATILLION_ETL_API"
  allowed_network_rules = ("EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__MATILLION_ETL_API")
  enabled = FALSE
  comment = 'External access integration to support traffic sent to the API for Matillion ETL'
;

How to Create a Matillion ETL Service Principal for API Requests

Creating a service principal in Matillion ETL is as simple as creating a new user and granting it the relevant access.

  1. First, log into Matillion ETL as an administrator, open the “Admin” tab and select “User Configuration:”
  2. Select the + icon to create a new user and grant it API access. The following example demonstrates a user called “my-api-user” that has been granted API access. It has also been granted full admin access to ensure that the full API functionality can be leveraged:

How to Configure a Secret to Contain a Password

Now that the service principal has been created, we can place the credentials inside a secret in Snowflake.

create secret "EXTERNAL_ACCESS"."SECRETS"."CREDS__MATILLION_ETL_API"
  type = PASSWORD
  username = 'my-api-user'
  password = '<REDACTED>' -- Enter the password here
  comment = 'Credentials for the Matillion ETL API user'
;

Attach the Optional Security Components to the External Access Integration

Now that the secret has been configured, attaching it to the external access integration is achieved with an alter statement. This is also a good opportunity to enable the integration.

alter external access integration "EAI__MATILLION_ETL_API"
  set allowed_authentication_secrets = ("EXTERNAL_ACCESS"."SECRETS"."CREDS__MATILLION_ETL_API")
    , enabled = TRUE
;

How to Trigger Matillion ETL Jobs in a Stored Procedure

Now that we have discussed and configured the underlying architecture behind our process, we can move on to our true goal, which is to create a stored procedure in Snowflake that can trigger a Matillion ETL job. In fact, we now have the tools we need to query any endpoint in the Matillion ETL API.

The specific endpoint we are interested in for this example is the following:

/v1/group/name/{groupName}/project/name/{projectName}/version/name/{versionName}/job/name/{jobName}/run?environmentName={environmentName}

This endpoint can be leveraged to trigger a job execution, however it requires four important inputs, along with an optional fifth input:

  1. The name of the project group
  2. The name of the project within the project group
  3. The name of the version within the project
  4. The name of the job within the project version
  5. (Optional) The name of the environment in which the job should be executed. If none is provided, the default is used.

It is also possible to include values for any scalar or grid variables within your API request.

Locating the Input Variables

Locating the input variables for a job in Matillion ETL is straightforward. If you do not know them already, open the job that you wish to trigger in Matillion ETL and look at the top of the screen where they will all be provided:

Providing Values for Scalar and Grid Variables

The API endpoint for triggering a job in Matillion ETL is a POST endpoint capable of accepting a payload. Here is an example from Matillion’s own documentation:

{
  "scalarVariables": {
      "{variable-name-1}": "{value-1}"
    , "{variable-name-2}": "{value-2}"
  }
  , "gridVariables": {
        "{grid-variable-name}": [
            [
                "{column-value-1}"
              , "{column-value-2}"
              , "{column-value-3}"
            ]
        ]
    }
}

Creating a Stored Procedure to Trigger a Matillion ETL Job

Finally, we come to the main meat of the article. First, I will provide the full code for the stored procedure. I will then explain some key areas.

This article does not explain Python stored procedures in general. If you wish to know more about these, the following article is recommended: A Definitive Guide to Python Stored Procedures in the Snowflake UI. This is part of our Snowflake with Python series.

The full procedure definition is as follows:

-- Create stored procedure
create or replace procedure "EXTERNAL_ACCESS"."PROCEDURES"."API__MATILLION_ETL__TRIGGER_JOB"(
      PROJECT_GROUP_NAME varchar
    , PROJECT_NAME varchar
    , JOB_NAME varchar
    , VERSION_NAME varchar default 'default'
    , ENVIRONMENT_NAME varchar default NULL
    , SCALAR_VARIABLES array default NULL
    , GRID_VARIABLES array default NULL
    , CONNECTION_STRING varchar default '?????' -- Populate with host of the target Matillion instance, such as matillion.company.com
  )
  copy grants
  returns variant
  language python
  runtime_version = 3.10
  handler = 'main'
  external_access_integrations = ("EAI__MATILLION_ETL_API")
  secrets = ('cred' = "EXTERNAL_ACCESS"."SECRETS"."CREDS__MATILLION_ETL_API")
  packages = ('snowflake-snowpark-python','requests', 'furl')
as
$$

## Import module for inbound Snowflake session
from snowflake.snowpark import Session as snowparkSession

## Imports
import _snowflake
import requests
from requests.auth import HTTPBasicAuth
from furl import furl

## Define main function
## Put at top of SProc to avoid Snowflake linting bug
## if converted to a Snowpark for Python worksheet
def main(
      snowpark_session: snowparkSession
    , project_group_name: str
    , project_name: str
    , job_name: str
    , version_name: str
    , environment_name: str
    , scalar_variables: list
    , grid_variables: list
    , connection_string: str
  ):

  ### Validate inputs
  if project_group_name is None \
    or len(project_group_name) == 0 \
    or project_name is None \
    or len(project_name) == 0 \
    or job_name is None \
    or len(job_name) == 0 \
    or version_name is None \
    or len(version_name) == 0 \
      :
    return "Missing inputs. Must provide all of project_group_name, project_name, job_name, version_name"
    
  ### Retrieve authentication token
  username_password_object = _snowflake.get_username_password("cred")
  basic_auth_token = HTTPBasicAuth(username_password_object.username, username_password_object.password)

  ### Generate URL to trigger job
  api_endpoint_url_unsafe = f"https://{connection_string}/rest/v1/group/name/{project_group_name}/project/name/{project_name}/version/name/{version_name}/job/name/{job_name}/run"

  ### Make the URL safe (replacing spaces with %20 etc)
  api_endpoint_url = furl(api_endpoint_url_unsafe)

  ### Populate the parameters for the API call
  api_parameters = {}
  if environment_name is not None \
    and len(environment_name) > 0 \
      :
    api_parameters["environmentName"] = environment_name

  ### Populate the payload for the API call
  api_payload = {}
  if scalar_variables is not None \
    and len(scalar_variables) > 0 \
      :
    api_payload["scalarVariables"] = scalar_variables
  if grid_variables is not None \
    and len(grid_variables) > 0 \
      :
    api_payload["gridVariables"] = grid_variables

  ### Populate headers for the API call
  headers = {
      "Content-Type": "application/json"
  }

  ### Execute API call to trigger job
  response = requests.request("POST", api_endpoint_url, headers=headers, params=api_parameters, json=api_payload, auth=basic_auth_token)

  try:
    if response.status_code == 200 :
      return response.json()
    else :
      return response.json()
  except Exception:
    return response

$$
;

-- Test query
call "EXTERNAL_ACCESS"."PROCEDURES"."API__MATILLION_ETL__TRIGGER_JOB"(
    PROJECT_GROUP_NAME => 'Example Group'
  , PROJECT_NAME => 'Example Project'
  , JOB_NAME => 'Example Job'
)
;

The following lines are important here:

  • Lines 17 and 18 enable the stored procedure to leverage the external access integration and accompanying secret that facilitate authentication to the Matillion ETL API.
  • Lines 60 and 61 are required to leverage the secret to generate an authentication token for the Matillion ETL API. That access token is then leveraged for authorization as part of the request on line 93. This also leverages the HTTPBasicAuth function that is imported on row 29.
  • Line 64 leverages the input variables to establish the full API endpoint that the request is sent to. This is then made URL-safe on row 67 using the furl function that is imported on row 30. This is an important step as names often include whitespace which is not URL-safe.
  • Lines 70-74 establish the parameters that are used in the API endpoint, consisting of the environment name if it was provided as an input variable.
  • Lines 77-85 establish the payload that is sent to the API, consisting of any scalar or grid variable values that may have been provided as input variables.
  • Line 93 sends the POST request itself to the Matillion ETL API, combining all of the elements established so far.
  • Lines 95-101 return the response from the request, with some minor error handling.
  • Lines 107-111 can be used to test the process.

Testing the Stored Procedure

A successful test of the stored procedure should yield a response that includes a message called msg and an id, which would be the ID for the execution that has just been triggered:

Once triggered, you should be able to monitor the execution in the task history area of either the Matillion ETL environment or the Matillion Hub. You can also track the status by sending requests to the relevant API endpoint.

Example task history in Matillion ETL:

Example task history in Matillion Hub:

Wrap Up

Hopefully the information in this article provides you with all of the tools you need to establish your own connections from Snowflake to the Matillion ETL API. This stored procedure can now be made available to other users using Snowflake’s Role-Based Access Control (RBAC) and could be useful in multiple scenarios, including:

  • Triggering a job as part of a scheduled task
  • Triggering a job as part of a wider set of stored procedures
  • Triggering a job when a stream detects new data from a snowpipe

More About the Author

Chris Hastie

Data Lead
Snowflake Security and Cost Management: Health Checks As many are already aware, there has recently been a lot of noise about security on the Snowflake platform. On June 2nd, 2024, ...
How to Improve Security with Snowflake’s Trust Center As many are already aware, recently there has been a lot of noise about security on the Snowflake platform. On June 2, 2024 Snowflake ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!