Snowflake External Access: Retrieve Data from an API

Data

Snowflake External Access: Retrieve Data from an API

//

Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure tunnels for Snowflake to communicate with the outside world, and can therefore communicate with external APIs to complete all manner of tasks. For example, external access could be used for any of the following, among many others:

This specific article focuses on how to retrieve the latest status summary from the Snowflake Status API as a table.

How Does External Access in Snowflake Work?

Before we dive into the specific example, let’s cover the architecture involved for external access in Snowflake. There are four main components:

  1. A network rule that controls where traffic is allowed to travel to.
  2. An external access integration that allows Snowflake objects to leverage the network rule.
  3. An optional security component to facilitate authentication. This usually leverages a Snowflake Secret object that can either securely store information such as a username and password, or make use of a related Security Integration object that facilitates OAuth authentication with an API endpoint.
  4. A Snowflake object such as a stored procedure or a user defined function (UDF) that leverages the external access integration.

This is demonstrated in the following diagram:

Snowflake - External Access - Message to API.png

How to Configure an Outbound Network Rule

In short, network rules are database objects that store information on a type of network traffic for Snowflake. This traffic can be one of two categories:

  • Ingress – Traffic from outside of Snowflake that is coming in to Snowflake
  • Egress – Traffic that is leaving Snowflake and travelling outside

This article specifically focuses on the egress category, and how we can leverage this to enable Snowflake to communicate with endpoints outside of Snowflake’s network. Note that in Snowflake’s terminology, the category (ingress/egress) is referred to as the “mode,” and this terminology will be used for the rest of this article.

Example Network Rule for External Access

Here is a simple example for creating an outbound/egress network rule:

create network rule "MY_OUTBOUND_NETWORK_RULE"
  type = HOST_PORT -- Always use this value for external access integrations
  mode = EGRESS -- Always use this value for external access integrations
  value_list = ('?????') -- Populate with host(s) of the target endpoint(s), such as matillion.customer.com, host.online.tableau.com, customer.webhook.office.com, eu1.api.matillion.com, etc
  comment = 'Allows traffic to be sent to the host(s) in the value_list'
;

This template can be used to configure the network rule for any external access.

It is important to note that Snowflake stores network rules as database objects, meaning they sit inside schemas in databases. This is different to network policies, which are account-level objects. This is easy to get confused, which is why it has been mentioned here.

Specific Network Rule Example

The following network rule could be leveraged to enable outbound/egress traffic with the Snowflake Status API:

create network rule "EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__SNOWFLAKE_STATUS_API"
  type = HOST_PORT
  mode = EGRESS
  value_list = ('status.snowflake.com') -- The host for the Snowflake Status API
  comment = 'Allows traffic to be sent to the Snowflake Status API'
;

How to Set Up an External Access Integration

In short, external access integrations are account-level objects that act as the bridge between egress network rules and any other Snowflake objects (stored procedures/UDFs).

Here is a simple example for creating an external access integration that leverages an outbound/egress network rule:

create external access integration "MY_EXTERNAL_ACCESS_INTEGRATION"
  allowed_network_rules = ("MY_DATABASE"."MY_SCHEMA"."MY_OUTBOUND_NETWORK_RULE")
  enabled = FALSE
  comment = 'External access integration to support traffic sent to the <endpoint(s) enabled by the network rule(s)>'
;

Specific External Access Integration Example

In many cases, authentication is needed to interact with an API. This is demonstrated in many of our other external access articles — for example, leveraging an OAuth integration when connecting to Matillion DPC, or leveraging standard username and password authentication when connecting to Matillion ETL.

In this specific example however, no authentication is needed. The Snowflake Status API is open. We can therefore create the external access integration directly, with nothing else to configure:

create external access integration "EAI__SNOWFLAKE_STATUS_API"
  allowed_network_rules = ("EXTERNAL_ACCESS"."NETWORK_RULES"."EGRESS__SNOWFLAKE_STATUS_API")
  enabled = TRUE
  comment = 'External access integration to support traffic sent to the Snowflake Status API'
;

How to Retrieve the Current Status Summary from the Snowflake Status API

Now that we have discussed and configured the underlying architecture behind our process, we can move on to our true goal, which is to create a User-Defined Table Function (UDTF) in Snowflake that can query the Snowflake Status API and return the current status summary.

A Quick Overview of the Snowflake Status API

Since we are only using this API for an example, we will not spend much time discussing the API itself. The most important information is that we will be querying this endpoint.

This endpoint returns the current status for each core component in Snowflake, and is similar to viewing the user-friendly equivalent page. If you like, open that endpoint now to see an example of the output. If Snowflake’s status is currently healthy, the output will be fairly dull. If Snowflake is encountering issues then the output gets more interesting.

Our purpose is to return the current status of every module within every cloud provider region for Snowflake, which is all contained in the “components” key.

Here is an example output, where we have only included the response for the region “AWS – US West (Oregon)”:

{
  "page": {
    "id": "94s7z8vpy1n8",
    "name": "Snowflake",
    "url": "https://status.snowflake.com",
    "time_zone": "America/Los_Angeles",
    "updated_at": "2024-06-11T22:26:37.674-07:00"
  },
  "components": [
    {
      "id": "l8zmwsrmltjd",
      "name": "AWS - US West (Oregon)",
      "status": "operational",
      "created_at": "2019-01-02T15:16:31.219-08:00",
      "updated_at": "2019-05-03T13:56:49.012-07:00",
      "position": 1,
      "description": null,
      "showcase": false,
      "start_date": null,
      "group_id": null,
      "page_id": "94s7z8vpy1n8",
      "group": true,
      "only_show_if_degraded": false,
      "components": [
        "pkrzt59vb1cm",
        "b96r20d6p5g7",
        "xjvzhg4llmqt",
        "6jy082tyx85d"
      ]
    },
    {
      "id": "pkrzt59vb1cm",
      "name": "Snowflake Data Warehouse (Database)",
      "status": "operational",
      "created_at": "2019-01-15T09:18:30.171-08:00",
      "updated_at": "2024-05-28T21:46:53.905-07:00",
      "position": 3,
      "description": null,
      "showcase": false,
      "start_date": null,
      "group_id": "l8zmwsrmltjd",
      "page_id": "94s7z8vpy1n8",
      "group": false,
      "only_show_if_degraded": false
    },
    {
      "id": "b96r20d6p5g7",
      "name": "Snowpipe (Data Ingestion)",
      "status": "operational",
      "created_at": "2019-01-30T00:40:49.592-08:00",
      "updated_at": "2024-02-20T14:16:33.244-08:00",
      "position": 4,
      "description": null,
      "showcase": false,
      "start_date": null,
      "group_id": "l8zmwsrmltjd",
      "page_id": "94s7z8vpy1n8",
      "group": false,
      "only_show_if_degraded": false
    },
    {
      "id": "xjvzhg4llmqt",
      "name": "Replication",
      "status": "operational",
      "created_at": "2020-04-28T08:52:25.259-07:00",
      "updated_at": "2024-02-20T14:16:33.256-08:00",
      "position": 5,
      "description": null,
      "showcase": false,
      "start_date": null,
      "group_id": "l8zmwsrmltjd",
      "page_id": "94s7z8vpy1n8",
      "group": false,
      "only_show_if_degraded": false
    },
    {
      "id": "6jy082tyx85d",
      "name": "Snowsight",
      "status": "operational",
      "created_at": "2021-08-06T09:12:25.424-07:00",
      "updated_at": "2024-04-30T07:31:05.031-07:00",
      "position": 6,
      "description": null,
      "showcase": false,
      "start_date": "2021-08-06",
      "group_id": "l8zmwsrmltjd",
      "page_id": "94s7z8vpy1n8",
      "group": false,
      "only_show_if_degraded": false
    }
  ]
}

A Quick Note on User-Defined Table Functions (UDTFs) in Snowflake

We are about to create a UDTF to return the current Snowflake status as a table. This article assumes familiarity with UDTFs, however the example itself is relatively simple anyway. If you wish for more information on UDTFs then the following articles are recommended:

Creating a UDTF to Return the Current Snowflake Status

Finally, we come to the main meat of the article. First, I will provide the full code for the UDTF. I will then explain some key areas.

This article does not explain Python stored procedures in general. If you wish to know more about these, the following articles are recommended: An Introduction to Python UDTFs in Snowflake, A Definitive Guide to Creating Python UDTFs Directly within the Snowflake User Interface. These are both part of our Snowflake with Python series.

The full UDTF definition is as follows:

-- Create UDTF
create or replace function "EXTERNAL_ACCESS"."FUNCTIONS"."API__SNOWFLAKE_STATUS__COMPONENTS"()
  copy grants
  returns table(
      "ID" string
    , "CLOUD_PROVIDER_REGION" string
    , "COMPONENT" string
    , "STATUS" string
    , "CREATED_AT" timestamp_ntz
    , "UPDATED_AT" timestamp_ntz
    , "DESCRIPTION" string
    , "POSITION_IN_GROUP" int
    , "DATA_QUERIED_AT" timestamp_ntz
  )
  language python
  runtime_version = 3.10
  handler = 'main'
  external_access_integrations = ("EAI__SNOWFLAKE_STATUS_API")
  packages = ('requests', 'pandas')
as
$$

## Imports
import requests
import pandas as pd

## Define main class
class main:
  
  ### Define process method that acts
  ### on each individual input row.
  ### This method must be provided in all UDTFs
  ### but is not needed for our example as we
  ### are doing nothing at a row level.
  def process(self) :
    pass

  ### Define end_partition method that acts
  ### on full partition after rows are processed
  def end_partition(self) :
    
    ### Retrieve the components from the endpoint
    snowflake_components_json = retrieve_snowflake_summary_as_json()

    ### Process JSON into a Pandas DataFrame
    df_snowflake_components = process_raw_components_data(snowflake_components_json)

    ### Output the result as tuples to match Snowflake UDTF requirements
    return list(df_snowflake_components.itertuples(index=False, name=None))
  
## Define function to send the request to the summary endpoint
def retrieve_snowflake_summary_as_json():

  ### Define the endpoint URL
  endpoint_url = r"https://status.snowflake.com/api/v2/summary.json"

  ### Populate headers for the API call
  headers = {
    "Content-Type": "application/json"
  }

  ### Execute API call
  response = requests.request("GET", endpoint_url, headers=headers)

  ### Only return something if the request is successful
  if response.status_code == 200 :
    return response.json()
  else :
    raise Exception(f"Error. Response code: {str(response.status_code)}")

## Define function to process JSON into a Pandas DataFrame
def process_raw_components_data(snowflake_components_json: dict):

  ### Load components into dataframe
  df_all = pd.json_normalize(snowflake_components_json["components"])
  df_all["data_queried_at"] = snowflake_components_json["page"]["updated_at"]

  ### Convert datetime fields.
  ### 1. Convert string to datetime
  ### 2. Remove timezone localization (output is now without timezone)
  df_all["created_at"] = \
    pd.to_datetime(df_all["created_at"], utc=True) \
      .dt.tz_localize(None)
      
  df_all["updated_at"] = \
    pd.to_datetime(df_all["updated_at"], utc=True) \
      .dt.tz_localize(None)
      
  df_all["data_queried_at"] = \
    pd.to_datetime(df_all["data_queried_at"], utc=True) \
      .dt.tz_localize(None)

  ### Build a formatted dataframe for all cloud provider region groups
  df_groups = df_all[df_all["group"] == True].copy()
  df_groups["component"] = "All"
  df_groups["cloud_provider_region"] = df_groups["name"]
  df_groups["position_in_group"] = 1

  ### Determine group-level output
  df_output_groups = df_groups[["id", "cloud_provider_region", "component", "status", "created_at", "updated_at", "description", "position_in_group", "data_queried_at"]]

  ### Build a formatted dataframe for all components
  df_components = df_all[df_all["group"] == False].copy()
  df_components["component"] = df_components["name"]
  df_components["position_in_group"] = df_components["position"]

  ### Attach the cloud provider region to the component
  df_components_merged = pd.merge(
      left = df_components
    , right = df_groups
    , left_on = ["group_id"]
    , right_on = ["id"]
    , suffixes = ["", "_from_group"]
  )

  ### Determine component-level output
  df_output_components = df_components_merged[["id", "cloud_provider_region", "component", "status", "created_at", "updated_at", "description", "position_in_group", "data_queried_at"]]

  ### Build final output dataframe, combining both outputs
  df_output = pd.concat([df_output_groups, df_output_components], axis=0)
  df_output.sort_values(["cloud_provider_region", "position_in_group"], inplace=True)

  return df_output

$$
;

-- Test query
select *
from table("EXTERNAL_ACCESS"."FUNCTIONS"."API__SNOWFLAKE_STATUS__COMPONENTS"())
;

The following lines are important here:

  • Line 18 enables the UDTF to leverage the external access integration.
  • Line 19 ensures that the relevant Python packages are available, which are then imported on lines 24-25 and leveraged throughout the function. Fortunately, these are all simple to ingest as they are already included in Anaconda’s Snowflake channel, which Snowflake leverages when executing Python.
  • Lines 51-69 define a function that sends a request to the Snowflake Status API and returns the response JSON. This is then leveraged on line 43.
  • Lines 71-123 convert the response JSON from the Snowflake Status API into a Pandas DataFrame, performing some light data engineering along the way. This is leveraged on line 46.
  • Line 49 returns the final result as a list of tuples, as per Snowflake Python UDTF requirements. The output aligns with the columns define on lines 5-13.
  • Lines 129-131 can be used to test the process.

This UDTF provides the main structure to achieve our goals, but only uses some basic error handling. Depending on your purposes, it may be worth expanding this when deploying it to your own environment.

Testing the UDTF

A successful test of the UDTF should yield a table detailing every Snowflake. Also, quick bonus in this screenshot that we can see Snowsight’s new Dark Mode!

image.png

Wrap Up

Hopefully the information in this article provides you with all of the tools you need to build your own UDTFs that retrieve data from external APIs. Naturally, we’ve deliberately chosen a simpler example for this article and have therefore not tackled challenges such as paging, schema changes or authentication, however those are all challenges better faced by a standard Python/APIs article rather than an article about enabling external access from Snowflake.

More About the Author

Chris Hastie

Data Lead
Snowflake External Access: Retrieve Data from an API Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure ...
Snowflake Security and Cost Management: Health Checks As many are already aware, there has recently been a lot of noise about security on the Snowflake platform. On June 2nd, 2024, ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!