This series shows you the various ways you can use Python within Snowflake.
In another post earlier this year, we discussed how to store the results of a Snowflake SHOW/LIST/DESCRIBE metadata command in a table.
Refresher
To refresh our memory, in Snowflake, there are several commands that can be used to retrieve metadata from the Snowflake environment without using a virtual warehouse. The following commands are all examples that fit this approach:
- SHOW – List all objects of a certain type within the account or within a given database/schema
- DESCRIBE – Retrieve metadata information about a specific object
- LIST – List files within a Snowflake stage
Since these commands do not fit the general format of SELECT ... FROM ...
it is harder to store the results somewhere. You cannot simply use INSERT INTO ... SELECT ... FROM ...
Fortunately, there is a way around this. To cut to the chase, this is our recommendation if we wish to leverage the SHOW USERS
command and output the results to a table:
SHOW USERS; SET query_id = ( SELECT QUERY_ID FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY()) WHERE QUERY_TEXT = 'SHOW USERS;' ORDER BY START_TIME DESC LIMIT 1 ) ; CREATE TABLE MY_TABLE AS SELECT * FROM TABLE(RESULT_SCAN($query_id) ;
Why a Stored Procedure?
I thought it would be useful to take this functionality and place it inside a stored procedure. Indeed, this does make the overall process more complex as you must now understand stored procedures if you wish to modify the approach, however this makes the process much easier for the end user. You can share the above block of code with them but some users may struggle with this as it leverages variables and requires the command to be changed in multiple places (rows 1 and 6).
From an end user perspective, it is far simpler to use the following code:
CALL SP_METADATA_COMMAND_TO_TABLE( 'SHOW DATABASES' -- METADATA_COMMAND varchar , 'MY_TEST_TABLE' -- DESTINATION_TABLE varchar ) ; SELECT * FROM MY_TEST_TABLE;
As for why we are using Python, I have a few reasons:
- A SQL stored procedure would have a harder time defining and executing functions within itself, especially when this is paired with looping. SQL is fantastic, but I would not argue for it to be a first class programming language like JavaScript and Python
- JavaScript could be a suitable alternative here and I have actually built this already using JavaScript. However, my understanding of Snowflake’s architecture is that JavaScript is executed through the service layer and only uses warehouses for executing SQL, whilst Python is executed using Snowpark and thus leverages warehouses for the entire script. I could be wrong with this understanding though, Snowpark is new functionality and I’m still trying to understand how it works more deeply.
- I don’t know Java or Scala, so they’re out of the competition
Important Note
This post does not cover the main principles of building Python stored procedures in Snowflake, as that is covered in my Definitive Guide to Python Stored Procedures in the Snowflake UI.
Creating the Python Stored Procedure
As you can see, the above block of code is a lot simpler for an end user to leverage. As we hinted at above though, the underlying code to configure the stored procedure is far more complex.
To convert the process into a stored procedure, we have to be prepared for the following process:
- Accept the metadata command and destination table as variables
- Ensure that both required variables have been populated
- Execute the given metadata command as a dataframe object
- Write the results of our dataframe object into a destination table
This post does not cover the main principles of building Python stored procedures in Snowflake, as that is covered in my Definitive Guide to Python Stored Procedures in the Snowflake UI.
Accept the Metadata Command and Destination Table as Variables
This part is very simple. When we define any stored procedure, we can also define input variables that can be referred to within the procedure. I recommend using UPPER_SNAKE_CASE naming conventions for your variables as Snowflake seems to struggle with variables in other cases.
CREATE OR REPLACE PROCEDURE SP_METADATA_COMMAND_TO_TABLE( METADATA_COMMAND varchar , DESTINATION_TABLE varchar ) returns string not null language python runtime_version = '3.8' packages = ('snowflake-snowpark-python') handler = 'execute_metadata_command_into_table' execute as caller -- Must execute as caller to support the SHOW command as $$ ## Define our Python code here $$ ;
Ensure That Both Required Variables Have Been Populated
This is the most interesting part of the stored procedure, in my opinion. In short, we follow these steps:
- Create a function that accepts a list of variables and tests whether the value of each variable is populated. If any are not populated, return an error message stating the name(s) of the variable(s) and explaining that they must be populated
- Define the list of variables which must be populated as a list of dictionaries, where each member stored both the variable name and the variable value
- Apply our validation function to our list of variables
- If any of the variables in the list fail the validation function’s tests, halt the stored procedure and return the error message(s)
The code below is more complex than needed when we are only validating the presence of two variables, however I have included this more complex approach as it can be used for any number of variables and thus should be more beneficial if the reader is attempting to modify this blog’s code to fit another purpose
Here is my full Python code to achieve these steps.
######################################################################### ## Error if any variables are not provided def validate_variables_list(REQUIRED_VARIABLES_LIST: list) : FUNCTION_RESULT = '' FUNCTION_RESULT_FLAG = False ## Use list comprehension to filter list of required variables to those with an empty or None value INVALID_REQUIRED_VARIABLES_LIST = [x["VARIABLE_NAME"] for x in REQUIRED_VARIABLES_LIST if x["VARIABLE_VALUE"] == None or len(x["VARIABLE_VALUE"]) == 0] ## If any failed members are found, combine the error messages if len(INVALID_REQUIRED_VARIABLES_LIST) > 0 : REQUIRED_VARIABLES_VALIDATION_ERROR_MESSAGES = [f"Failed: {VARIABLE_NAME} parameter must be populated" for VARIABLE_NAME in INVALID_VARIABLES_LIST] REQUIRED_VARIABLES_VALIDATION_ERROR_MESSAGES_COMBINED = ',\n'.join(REQUIRED_VARIABLES_VALIDATION_ERROR_MESSAGES) FUNCTION_RESULT = REQUIRED_VARIABLES_VALIDATION_ERROR_MESSAGES_COMBINED FUNCTION_RESULT_FLAG = True return [FUNCTION_RESULT, FUNCTION_RESULT_FLAG] def main(METADATA_COMMAND: str, DESTINATION_TABLE: str) : ## Define list of required variables REQUIRED_VARIABLES_LIST = [ { "VARIABLE_NAME": 'METADATA_COMMAND' , "VARIABLE_VALUE": METADATA_COMMAND } , { "VARIABLE_NAME": 'DESTINATION_TABLE' , "VARIABLE_VALUE": DESTINATION_TABLE } ] try: ## Execute function to test required variables [REQUIRED_VARIABLES_RESULT, REQUIRED_VARIABLES_RESULT_FLAG] = validate_variables_list(REQUIRED_VARIABLES_LIST) ## Halt procedure early if variables test fails if REQUIRED_VARIABLES_RESULT_FLAG == True : return REQUIRED_VARIABLES_RESULT except Exception as err : return err
Execute the Given Metadata Command as a Dataframe Object
This part is much simpler than the JavaScript alternative. It only takes a single line of code to store a query result as a dataframe.
## Read the command into a Snowflake dataframe METADATA_RESULT_DF = snowpark_session.sql(METADATA_COMMAND)
Write the Results of Our Dataframe Object into a Destination Table
Finally, we store the results of our dataframe as a table in Snowflake. Again, this only takes a single line of code, which is much easier than the JavaScript alternative. This version overwrites an existing table
## Write the results of the dataframe into a target table METADATA_RESULT_DF.write.mode("overwrite").save_as_table(DESTINATION_TABLE)
GitHub Repository
I have collated all of these individual blocks into a full script and stored it in the InterWorks Public GitHub Repository for Snowflake Python functionality. You can find it here.
I know many people will prefer not to use a stored procedure and keep with the original approach from the previous blog, and that’s okay. I hope some of you find this approach useful too. Please reach out if you find any interesting uses for this.