Using a Python Stored Procedure to Execute Snowflake Metadata Commands into Tables (SHOW/LIST/DESCRIBE)

This series shows you the various ways you can use Python within Snowflake.

In another post earlier this year, we discussed how to store the results of a Snowflake SHOW/LIST/DESCRIBE metadata command in a table.

Refresher

To refresh our memory, in Snowflake, there are several commands that can be used to retrieve metadata from the Snowflake environment without using a virtual warehouse. The following commands are all examples that fit this approach:

SHOW – List all objects of a certain type within the account or within a given database/schema
DESCRIBE – Retrieve metadata information about a specific object
LIST – List files within a Snowflake stage

Since these commands do not fit the general format of SELECT ... FROM ... it is harder to store the results somewhere. You cannot simply use INSERT INTO ... SELECT ... FROM ...

Fortunately, there is a way around this. To cut to the chase, this is our recommendation if we wish to leverage the SHOW USERS command and output the results to a table:

SHOW USERS;

SET query_id = (
  SELECT QUERY_ID
  FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
  WHERE QUERY_TEXT = 'SHOW USERS;'
  ORDER BY START_TIME DESC
  LIMIT 1
)
;

CREATE TABLE MY_TABLE
AS
SELECT *
FROM TABLE(RESULT_SCAN($query_id)
;

Why a Stored Procedure?

I thought it would be useful to take this functionality and place it inside a stored procedure. Indeed, this does make the overall process more complex as you must now understand stored procedures if you wish to modify the approach, however this makes the process much easier for the end user. You can share the above block of code with them but some users may struggle with this as it leverages variables and requires the command to be changed in multiple places (rows 1 and 6).

From an end user perspective, it is far simpler to use the following code:

CALL SP_METADATA_COMMAND_TO_TABLE(
    'SHOW DATABASES'              -- METADATA_COMMAND varchar
  , 'MY_TEST_TABLE'               -- DESTINATION_TABLE varchar
)
;

SELECT * FROM MY_TEST_TABLE;

As for why we are using Python, I have a few reasons:

A SQL stored procedure would have a harder time defining and executing functions within itself, especially when this is paired with looping. SQL is fantastic, but I would not argue for it to be a first class programming language like JavaScript and Python
JavaScript could be a suitable alternative here and I have actually built this already using JavaScript. However, my understanding of Snowflake’s architecture is that JavaScript is executed through the service layer and only uses warehouses for executing SQL, whilst Python is executed using Snowpark and thus leverages warehouses for the entire script. I could be wrong with this understanding though, Snowpark is new functionality and I’m still trying to understand how it works more deeply.
I don’t know Java or Scala, so they’re out of the competition

Important Note

This post does not cover the main principles of building Python stored procedures in Snowflake, as that is covered in my Definitive Guide to Python Stored Procedures in the Snowflake UI.

Creating the Python Stored Procedure

As you can see, the above block of code is a lot simpler for an end user to leverage. As we hinted at above though, the underlying code to configure the stored procedure is far more complex.

To convert the process into a stored procedure, we have to be prepared for the following process:

Accept the metadata command and destination table as variables
Ensure that both required variables have been populated
Execute the given metadata command as a dataframe object
Write the results of our dataframe object into a destination table

This post does not cover the main principles of building Python stored procedures in Snowflake, as that is covered in my Definitive Guide to Python Stored Procedures in the Snowflake UI.

Accept the Metadata Command and Destination Table as Variables

This part is very simple. When we define any stored procedure, we can also define input variables that can be referred to within the procedure. I recommend using UPPER_SNAKE_CASE naming conventions for your variables as Snowflake seems to struggle with variables in other cases.

CREATE OR REPLACE PROCEDURE SP_METADATA_COMMAND_TO_TABLE(
    METADATA_COMMAND varchar
  , DESTINATION_TABLE varchar
)
  returns string not null
  language python
  runtime_version = '3.8'
  packages = ('snowflake-snowpark-python')
  handler = 'execute_metadata_command_into_table'
  execute as caller -- Must execute as caller to support the SHOW command
as
$$
## Define our Python code here
$$
;

Ensure That Both Required Variables Have Been Populated

This is the most interesting part of the stored procedure, in my opinion. In short, we follow these steps:

Create a function that accepts a list of variables and tests whether the value of each variable is populated. If any are not populated, return an error message stating the name(s) of the variable(s) and explaining that they must be populated
Define the list of variables which must be populated as a list of dictionaries, where each member stored both the variable name and the variable value
Apply our validation function to our list of variables
If any of the variables in the list fail the validation function’s tests, halt the stored procedure and return the error message(s)

The code below is more complex than needed when we are only validating the presence of two variables, however I have included this more complex approach as it can be used for any number of variables and thus should be more beneficial if the reader is attempting to modify this blog’s code to fit another purpose

Here is my full Python code to achieve these steps.

#########################################################################
## Error if any variables are not provided

def validate_variables_list(REQUIRED_VARIABLES_LIST: list) :

  FUNCTION_RESULT = ''
  FUNCTION_RESULT_FLAG = False

  ## Use list comprehension to filter list of required variables to those with an empty or None value
  INVALID_REQUIRED_VARIABLES_LIST = [x["VARIABLE_NAME"] for x in REQUIRED_VARIABLES_LIST if x["VARIABLE_VALUE"] == None or len(x["VARIABLE_VALUE"]) == 0]

  ## If any failed members are found, combine the error messages 
  if len(INVALID_REQUIRED_VARIABLES_LIST) > 0 :
    REQUIRED_VARIABLES_VALIDATION_ERROR_MESSAGES = [f"Failed: {VARIABLE_NAME} parameter must be populated" for VARIABLE_NAME in INVALID_VARIABLES_LIST]
    REQUIRED_VARIABLES_VALIDATION_ERROR_MESSAGES_COMBINED = ',\n'.join(REQUIRED_VARIABLES_VALIDATION_ERROR_MESSAGES)
    FUNCTION_RESULT = REQUIRED_VARIABLES_VALIDATION_ERROR_MESSAGES_COMBINED
    FUNCTION_RESULT_FLAG = True

  return [FUNCTION_RESULT, FUNCTION_RESULT_FLAG]

def main(METADATA_COMMAND: str, DESTINATION_TABLE: str) :

  ## Define list of required variables
  REQUIRED_VARIABLES_LIST = [
      {
          "VARIABLE_NAME": 'METADATA_COMMAND'
        , "VARIABLE_VALUE": METADATA_COMMAND
      }
    , {
          "VARIABLE_NAME": 'DESTINATION_TABLE'
        , "VARIABLE_VALUE": DESTINATION_TABLE
      }
  ]

  try:
    ## Execute function to test required variables
    [REQUIRED_VARIABLES_RESULT, REQUIRED_VARIABLES_RESULT_FLAG] = validate_variables_list(REQUIRED_VARIABLES_LIST)

    ## Halt procedure early if variables test fails
    if REQUIRED_VARIABLES_RESULT_FLAG == True :
      return REQUIRED_VARIABLES_RESULT
      
  except Exception as err :
    return err

Execute the Given Metadata Command as a Dataframe Object

This part is much simpler than the JavaScript alternative. It only takes a single line of code to store a query result as a dataframe.

## Read the command into a Snowflake dataframe
METADATA_RESULT_DF = snowpark_session.sql(METADATA_COMMAND)

Write the Results of Our Dataframe Object into a Destination Table

Finally, we store the results of our dataframe as a table in Snowflake. Again, this only takes a single line of code, which is much easier than the JavaScript alternative. This version overwrites an existing table

## Write the results of the dataframe into a target table
METADATA_RESULT_DF.write.mode("overwrite").save_as_table(DESTINATION_TABLE)

GitHub Repository

I have collated all of these individual blocks into a full script and stored it in the InterWorks Public GitHub Repository for Snowflake Python functionality. You can find it here.

I know many people will prefer not to use a stored procedure and keep with the original approach from the previous blog, and that’s okay. I hope some of you find this approach useful too. Please reach out if you find any interesting uses for this.

WHITE PAPER

Using a Python Stored Procedure to Execute Snowflake Metadata Commands into Tables (SHOW/LIST/DESCRIBE)

Using a Python Stored Procedure to Execute Snowflake Metadata Commands into Tables (SHOW/LIST/DESCRIBE)

Refresher

Why a Stored Procedure?

Important Note

Creating the Python Stored Procedure

Accept the Metadata Command and Destination Table as Variables

Ensure That Both Required Variables Have Been Populated

Execute the Given Metadata Command as a Dataframe Object

Write the Results of Our Dataframe Object into a Destination Table

GitHub Repository

Want More Snowflake with Python

Related

Whitepapers & Downloads

The Data Enablement Field Guide

Chris Hastie

Was this article helpful?

Using a Python Stored Procedure to Execute Snowflake Metadata Commands into Tables (SHOW/LIST/DESCRIBE)

Using a Python Stored Procedure to Execute Snowflake Metadata Commands into Tables (SHOW/LIST/DESCRIBE)

Refresher

Why a Stored Procedure?

Important Note

Creating the Python Stored Procedure

Accept the Metadata Command and Destination Table as Variables

Ensure That Both Required Variables Have Been Populated

Execute the Given Metadata Command as a Dataframe Object

Write the Results of Our Dataframe Object into a Destination Table

GitHub Repository

Want More Snowflake with Python

Related

Whitepapers & Downloads

The Data Enablement Field Guide

Chris Hastie

Company Information

Tax Information

Payment Options

Other Information

Terms

Was this article helpful?