A Definitive Guide to Snowflake Sessions with Snowpark for Python

Data

A Definitive Guide to Snowflake Sessions with Snowpark for Python

This series shows you the various ways you can use Python within Snowflake.

Snowpark for Python is the name for the new Python functionality integration that Snowflake has recently developed. At the Snowflake Summit in June 2022, Snowpark for Python was officially released into Public Preview, which means anybody is able to get started using Python in their own Snowflake environment.

I would understand if you wanted to skip this post and cut directly to the Snowflake’s monolithic documentation page. However, I would strongly recommend reading through this post first as it will give you a strong introduction into setting up a Snowpark environment to connect to Snowflake and provide you with a few shortcuts in the form of code snippets.

This post is intended to walk you through how to set up your local machine to connect to Snowflake using the new Snowpark for Python package, and pairs well with our bespoke Snowpark package on our GitHub, which contains much of the code I will be using in this post. This bespoke package is what qualifies this blog post to be the inaugural post in our new Monthly Monday Deep Dive series, as the package can be used to streamline your own development and is entirely open-source so you can download and modify it as you please!

If you are not interested in connecting to Snowflake and instead wish to simply dive into creating Python UDFs and Stored Procedures from within the Snowflake User Interface, then I would recommend these posts instead:

Before we dive into the technical aspects of this blog post, we will first discuss what the new functionality is and why this post is useful.

What Is the Snowpark for Python Package?

The new Snowpark for Python package is a Python package that has been built to enable connectivity to Snowflake in a more advanced manner than the old Snowflake Connector for Python. The old connector allowed you to execute SQL commands in Snowflake and achieve a few other things, but it did not really stretch beyond this. In addition to building SQL statements and executing them on your Snowflake environment, similar to the old connector, the new Snowpark for Python package allows us to achieve a far greater volume of results, including the following:

  • Directly read data from tables, views and SELECT statements as new Snowflake dataframes that can be converted into Pandas dataframes.
  • Directly convert Pandas dataframes into Snowflake dataframes and/or write the data into Snowflake
  • Create and modify User Defined Functions (UDFs) that leverage Python, using Python scripts and packages that are not stored within Snowflake already.
  • Create and modify Stored Procedures that leverage Python, using Python scripts and packages that are not stored within Snowflake already.
  • Call UDFs and Stored Procedures directly as part of a wider Python script.

Why Is This Post Useful?

The purpose of this post is to achieve the following key objectives:

  1. Explain installing and configuring the required components:
    1. Explain the requirements that must be met to leverage the Snowpark for Python package.
    2. Discuss the Anaconda distribution channel and its benefits over a standard Python installation when testing functionality locally.
    3. Walk through installing the required packages to a Python virtual environment on a local machine using pip for standard Python and Conda for Anaconda deployments.
    4. Walk through installing the required packages to a Python Anaconda virtual environment on a local machine using Conda.
    5. Optional aside to walk through configuring Visual Studio Code to leverage a Python Anaconda virtual environment when executing code.
  2. Explain creating and using a Snowflake Snowpark session:
    1. Discuss the core parameters required to create a Snowflake Snowpark session and the different methods of authentication.
    2. Demonstrate creating a Snowflake Snowpark session and using it to execute some simple queries locally.
  3. Introduce InterWorks’ bespoke Snowpark package on our GitHub, which contains a series of Python functions to simplify creating Snowflake Snowpark sessions leveraging any of the following configurations:
    • A locally-stored JSON file containing the required connection parameters.
    • Environment variables containing the required connection parameters.
    • A Streamlit secrets file containing the required connection parameters.

This post is not intended to demonstrate the creation of UDFs and Stored Procedures using Snowpark for Python. However, it does serve as a prerequisite. I will be releasing further posts to tackle these challenges in a similar format to my previous “definitive guides” to achieve the same within the Snowflake UI.

And so ends our preamble as we dive into the meat of the post!

Installing and Configuring Snowpark for Python

Before we can use create Snowpark sessions that interact with Snowflake, we must install and configure the package in our environment.

Requirements to Leverage the Snowpark for Python Package

At its core, the Snowpark for Python package is a Python package called snowflake-snowpark-python that is available for install through the standard approach for Python packages:

pip install snowflake-snowpark-python

To leverage this package, you must have Python 3.8 as a minimum.

(At time of writing, Snowflake uses Python 3.8 whenever it executes UDFs and stored procedures. To match this and avoid potential errors, I would recommend using Python 3.8 when using Snowpark for Python locally.)

Whilst this is sufficient for getting started, best practice would always be to have a dedicated Python/Conda virtual environment for your Snowpark scripts instead of mixing different projects/packages/etc in your default Python environment.

Anaconda’s Snowflake Snowpark for Python Channel

As you may already be aware, Snowflake have partnered with Anaconda when developing this new Snowpark for Python functionality. A key benefit to this partnership is Anaconda’s Snowflake Snowpark for Python channel which fully details the third-party packages that are openly available for use in Snowflake Python scripts, including their versions.

To make full use of this benefit, an installation of Anaconda or Miniconda is required on your machine. Anaconda and Miniconda are both Python distribution packages that can be used to configure Python environments on your local machine. Anaconda is the full distribution package that is capable of far more, whilst Miniconda is the small version of Anaconda that contains a much smaller set of functionality. Most importantly, both of these distribution packages include Conda, which is the command line tool that allows us to create and modify Python virtual environments with additional functionality such as the capability to leverage specific package channels.

The above paragraph says a lot, but arguably also does not really tell you anything in this context. You are likely still wondering what the key advantage is to leveraging Anaconda instead of standard Python when setting up a Snowpark for Python environment. The answer is that Anaconda can create/modify your virtual environment using Anaconda’s Snowflake Snowpark for Python channel and can thus ensure every package you install on your local machine matches the packages natively available to Snowflake. This means we can test our code far more reliably as locally-executed tests are using the same package versions as Snowflake will use, and thus should execute/operate the same locally as they will within Snowflake UDFs/Stored Procedures.

In short, I would strongly recommend setting up any Snowpark environments using Anaconda as this makes it far easier to keep your local environment aligned with Snowflake.

(It is important to note that Anaconda’s Snowflake Snowpark for Python channel may not include all of the packages you require for your script, especially if you have developed your own packages internally. In this case, it is strongly recommended that you leverage the same package versions locally as you deploy to Snowflake when leveraging such packages. This will be discussed further in my follow-up blog posts regarding creating Snowflake UDFs/Stored Procedures using Snowpark for Python.)

Installing the Required Packages for Standard Python Using Pip

Skip this section if you intend to use Anaconda to configure your environment, which I would recommend.

As stated before, the Snowpark for Python package is a Python package called snowflake-snowpark-python that is available for install through the standard approach for Python packages. If desired, here is the specific PyPI project.

First, install Python 3.8.10, which is the latest full release for Python 3.8.

Before we can install the package, it is recommended to set up a virtual environment for our project. We can do so with the following commands in a shell/bash/cmd/PowerShell terminal:

If using another terminal, I would recommend checking the specific script in the Python venv documentation.

Now that you have created and activated a virtual environment, you can install the Snowpark for Python package:

Installing the Required Packages Using Anaconda

First, install Anaconda or Miniconda so that you can execute Conda commands from an Anaconda terminal on your local machine. Executing conda from your Anaconda terminal should yield similar results to the following screenshot:

Once this is established, a new Python Anaconda virtual environment can be created with the following command:

conda create --name env_name --channel https://repo.anaconda.com/pkgs/channel_name python=3.8
Specifically, we can use the following command to create a Snowpark-suitable virtual environment, called snowpark.

Once you have created your environment, you can install additional packages by adapting the following example code:

At this stage, I would like to note that most projects will require further packages such as Pandas. Whilst you can go down the route of installing each package individually, you may find it easier to list all of your packages in a requirements.txt file and install them with the following command:

This section discusses how to configure VSCode so that you can execute Python commands line-by-line using a Python Anaconda virtual environment. If you are not using VSCode then this section will not be relevant for you and I would advise finding out how you can achieve a similar result for your IDE of choice.

Configuring Conda to work with Python through VSCode requires you to set up a bespoke terminal profile within your VSCode settings. You can do so by following these steps within VSCode:

  1. Open the commands shortcut (CTRL+SHIFT+P) and select Terminal: Select Default Profile
  2. Select the small settings cog to the right of one of the existing PowerShell profiles
  3. In the naming window, enter the name “Windows PowerShell Conda” or any other name if you prefer
  4. Use the same commands shortcut (CTRL+SHIFT+P) and select Preferences: Open Settings (JSON)
  5. Steps 1-3 will have added a new key to your settings.json file called terminal.integrated.profiles.windows, which lists a set of different terminal profiles. The final entry on this list should be your new profile, which we called “Windows PowerShell Conda”. Modify this profile to match the following settings:

    The desired path to `conda-hook.ps1` depends on where you installed Conda. Usually this will be one of the following two options:

    • When Conda was installed to the C drive for all users:
    • When Conda was installed for a single user, where %USERPROFILE% should be expanded out to the full path:
  6. Finally, open the commands shortcut (CTRL+SHIFT+P) and select Terminal: Select Default Profile, then select the new profile as the default. For our example, we would select “Windows PowerShell Conda”
  7. If we now review our settings.json file again, we should see the new terminal profile along with the key “” that denotes the default terminal profile.
  8. If we open a new Python file, we should also see our list of Anaconda Python environments when selecting an interpreter.

Creating and Using a Snowflake Snowpark Session

At this point, we have set up a Python (Anaconda) virtual environment on our local machine that has access to the snowflake-snowpark-python package. An argument could have been made for this to be the end of the blog post and for the next session to be a separate post, however I would much prefer to have all of this in one place to form the singular “definitive guide”.

Core Parameters Required for Authentication

To create a Snowflake Snowpark session, we must provide Snowflake with several parameters for authentication. The full list of parameters can be found in Snowflake’s documentation, however I have provided a condensed list below that is grouped for clarity:

  • Critical parameters that must be included
    • account – The Snowflake account to create a session for. This is of the form <account>.<region>.<cloud provider> and matches your Classic UI login URL before the .snowflakecomputing.com suffix
    • user – The name of the user to authenticate to Snowflake with
  • Optional parameters for context in the session, which define the Snowflake object of the same name that will be used by the session. This is similar to a typical worksheet in the Snowflake UI and can be changed after authentication by using Session.sql() to execute SQL commands of the form USE... <object_type> <object_name>
    • role – If left blank, the user’s default will be used until a SQL statement of the form USE... is leveraged
    • warehouse – If left blank, the user’s default will be used until a SQL statement of the form USE... is leveraged
    • database – If left blank, the user’s default will be used until a SQL statement of the form USE... is leveraged
    • schema – If left blank, the user’s default will be used until a SQL statement of the form USE... is leveraged
  • Authentication parameters, of which at least one must be populated
    • password – The password used to authenticate the user
    • private_key – The private key used to authenticate the user. This must be serialised and cannot be stored as plain text, which is a challenge solved both in an example below and in InterWorks’ bespoke Snowpark package on our GitHub. Notice how we are not asking for the private key passphrase here too, as this will already have been leveraged when serializing the private key.
  • When using password authentication instead of private_key, it is also worth considering the MFA implications and potentially providing the optional authenticator parameter as follows:
    • Set authenticator to externalbrowser to authenticate in a web browser with Okta, ADFS, or similar SAML identity providers
    • Set authenticator to username_password_mfa to authenticate with the provided password and cache the MFA response, minimising the requirement for further MFA prompts as the session continues

If configuring Snowpark with the intent of automated script execution or similar, I would strongly recommend configuring key pair authentication for the service user. I have already released a blog post that details how to set up key pair authentication for a user in Snowflake, and in the near future I will be releasing a blog post that achieves a similar result using a Python UDF within Snowflake itself.

If required, you can find more information on authentication in Snowflake’s documentation.

Creating a Snowflake Snowpark Session

Now that we have outlined the elements required to create a Snowflake Snowpark session, let’s connect and execute some example SQL code.

Example Credentials

For the purposes of our examples, we will consider a fictional Snowflake account and user. Assume we have a Snowflake account hosted on AWS US West called DummyCorp. When logging into Snowflake, we visit the following URL:

If you are actually using a password such as P@55w0rd then I seriously recommend changing it to something more secure and harder to guess

Our user’s default role is SYSADMIN but they also have access to SECURITYADMIN.

Our user has also been configured with the following private key, provided in plain text:

Creating a Snowflake Snowpark Session with a Password

Now that we have our example credentials, we can provide these as an object to the Session.builder.configs() method in a simple Python script:

Creating a Snowflake Snowpark Session with a Private Key

Leveraging a private key is a little bit more tricky as we must serialize it for transmission to Snowflake. Here is a Python script in which we serialize the private key and pass it to a Snowflake connection parameter:

This is definitely a more lengthy code due to all of the work involved in loading and serializing the private key, however it is far more robust for a service user as it will not rely on MFA and should be more secure due to the combination of a lengthy private key and a private key passphrase.

Using a Snowflake Snowpark Session to Execute Queries and Test Our Connection

Whether we have leveraged a password or a private key, at this stage we should have created a Snowflake Snowpark Session that connects to our Snowflake environment. In our example, we have called this snowpark_session.

The following example shows just how easy it is to execute SQL commands in Snowflake. Look out for our upcoming Technical Tuesday blog posts to see far more interesting examples in which we will be creating Python UDFs and Stored Procedures using Snowpark for Python:

InterWorks’ Bespoke Snowpark Package

Our bespoke Snowpark package is available on our GitHub and has been built to simplify the process of creating connections to Snowflake using Snowpark for Python. If you visit the link, you can download the snowpark folder in its entirety and test your connection using one of the connection tests.

The following information is also provided in the README of the GitHub repository.

Connection Parameters

The Python scripts in the package are configured to ingest a snowflake_connection_parameters dictionary and use it to establish connections to Snowflake.

The dictionary may either be provided as a locally stored .json file, as environment variables, or as part of a Streamlit secrets .toml file.

Connection Parameters in a Local JSON File

If using a locally stored .json file, this file should be created in the root directory as this repository and should match the format below. This is not synced with git (currently in `.gitignore` to avoid security breaches).

If using a streamlit secrets file, this file should be created in a subdirectory called .streamlit within the root directory as this repository and should match the format below. This is not synced with git (currently in `.gitignore` to avoid security breaches). Alternatively if deploying Streamlit remotely, the secrets should be entered in the Streamlit development interface.

If using environment variables, they should match the format below.

Testing Your Snowpark Connection

This section contains steps to simplify testing of your Snowpark connection. Naturally, these tests will only be successful if you have configured the corresponding parameters as detailed above.

Testing Your Snowpark Connection with a Parameters JSON File

Execute the following steps in an Anaconda terminal, or directly run the connection_tests/test_snowpark_connection_via_parameters_json.py file step by step through VSCode or similar.

Testing Your Snowpark Connection with Streamlit Secrets

Please note that this will not work by default as we have not included streamlit in the requirements.txt file by default. You will need to add this to the requirements file and install it into your environment, or you can install it manually with the following command.

Testing your Snowpark Connection with Environment Variables

Execute the following steps in an Anaconda terminal, or directly run the connection_tests/test_snowpark_connection_via_environment_variables.py file step by step through VSCode or similar.

Summary

And so concludes our blog post about connecting to Snowflake from a local machine using Snowpark for Python. Now that we have a clear guide to creating Snowflake Snowpark sessions in Python, I will be releasing further posts to walk through how to use these Snowflake Snowpark sessions to create and call Python UDFs and Stored Procedures.

I hope you’ve found this post useful and look forward to any comments or constructive criticism on the InterWorks’ bespoke Snowpark package on our GitHub.

More About the Author

Chris Hastie

Data Lead
Configure Azure Private Endpoints for Snowflake Internal Stages Snowflake and Azure are incredibly capable of storing data securely. You can see all reports available to customers through their pages ...
Configure Azure Private Link Connectivity with Snowflake Snowflake and Azure are incredibly capable of storing data securely. You can see all reports available to customers through their pages ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!