This series takes you from zero to hero with the latest and greatest cloud data warehousing platform, Snowflake.
Our previous post in this series explained how to import data via the Snowflake user interface. You can also stage your data and load it into tables with code via SnowSQL, which is what this blog post will demonstrate.
Setting the Stage for SnowSQL
There are many ways to import data into Snowflake, and utilising code allows users to automate the process. To import data into a Snowflake stage using SnowSQL, the following requirements must be met:
- SnowSQL must be installed on the user’s machine
- The data must be in a suitable file format
- A warehouse in Snowflake must exist to perform the import
- A destination database must exist in Snowflake
- A destination stage must exist within the Snowflake database
To load data from a Snowflake stage into a table, the following additional requirements must be met:
- A destination table must exist within the Snowflake database
- A suitable file format must exist within the Snowflake database
SnowSQL is the command line interface tool that allows users to execute Snowflake commands from within a standard terminal. This means users do not need to be within the Snowflake UI to prepare and execute queries. It is also noteworthy that certain commands cannot be executed from within the Snowflake Worksheets environment but can be executed via SnowSQL, with the most noteworthy being the command to import data into a Snowflake stage.
Before we can start using SnowSQL, we must install it. From within the Snowflake UI, the Help button allows users to download supporting tools and drivers:
As our goal is to install the SnowSQL Command Line Interface (CLI) Client, we select this from the list and choose the appropriate operating system:
Once the installer downloads, execute it and run through the installer to install SnowSQL. You may need to enter admin credentials to install the software.
Verifying the Installation of SnowSQL
Once SnowSQL is installed, it can be called from within any standard terminal. For this example, we will be using Windows Powershell; however, we could equivalently use any other terminal depending on the operating system:
Begin by executing the command SnowSQL to verify the installation. This will list the various options which can be used within SnowSQL:
Preparing Data for Import
Before we can import any data into Snowflake, it must first be stored in a supported format. At time of writing, the full list of supported is contained in the table below. An up-to-date list of supported file formats can be found in Snowflake’s documentation:
*Note: The XML preview feature link can be accessed here
For this example, we will load the following structured data which is delimited with vertical bars:
Using SnowSQL to Execute Commands
The first step toward utilising SnowSQL is to establish a connection with your Snowflake environment. Three pieces of information are required to establish a connection:
- The Snowflake account to connect to, including the region
- An existing user within that account
- The corresponding password for that user
As Snowflake is a cloud service, each account is accessed via a URL. For example, InterWorks’ account in the EU is accessed via https://interworks.eu-central-1.snowflakecomputing.com.
This URL is formed by https://<Account>.<Account_Region>.snowflakecomputing.com.
Therefore, we can see that our account is interworks.eu-central-1. Note that we are still using Windows Powershell; however, this can be achieved in any equivalent terminal. We enter account using the –accountname option in SnowSQL, along with my user chris.hastie with the –username option:
We can also use the shorter names for these options, respectively -a and -u:
In both situations, a password prompt will appear. Enter the password for the desired user. Do not be alarmed if your terminal does not display the password—it will still be reading the input.
With the connection established, SnowSQL will output the current user along with the current warehouse, database and schema. We have not yet selected any, which is why they are all left as none selected:
Preparing a Warehouse
We encourage Snowflake users to test their data loads with various warehouse sizes to find their optimal balance between credit cost to performance. As we are loading a very small file in a bespoke instance, we will use an extra-small warehouse called LOAD_WH.
If the warehouse does not exist, it can be created with the CREATE WAREHOUSE command. Note that we have determined the size as extra small, along with setting the AUTO_SUSPEND to 300 and the AUTO_RESUME to TRUE. This means that our warehouse will automatically suspend itself if it has not been used for five minutes, eliminating costs when the warehouse is unused:
Upon successful execution, SnowSQL will output that the warehouse has been created:
Once the warehouse is created, it can be used with the USE WAREHOUSE command:
We can now see our current warehouse in our connection string directly after the username:
Preparing a Database
In a similar fashion to warehouses, users can create databases via the CREATE DATABASE command. For this example, we will create a databased called DEMO_DB:
With this database created, our connection will automatically update to use this database. DEMO_DB is now provided in the connection string, along with the default PUBLIC schema:
For the sake of uniformity, we will also use the CREATE SCHEMA command to create a new schema called DEMO_SCHEMA:
Our connection string has now updated to display that we are connected to the DEMO_SCHEMA schema within our DEMO_DB database:
Preparing a Stage
Stages are Snowflake’s version of storage buckets. Each stage is a container for storing files and can be either an internal stage within Snowflake or an external stage within Amazon S3 or Azure Blob storage.
For this example, we will create a simple internal stage within Snowflake using the CREATE STAGE command. Our stage will be called DEMO_STAGE:
If our data was stored in Amazon S3 or Azure Blob storage, we would require authentication credentials to the storage location along with the URL. These can be entered as part of the CREATE STAGE command, as in the following example:
Finally, we can import our data into our stage using the PUT command. More information for this command and other SnowSQL commands can be found in Snowflake’s documentation. A key piece of information is that the PUT command can only be executed from outside the main Snowflake interface, which is why we are using SnowSQL instead.
It is important to get the file location string correct as this may vary depending on the terminal you are using. Note that the stage is determined using an @ sign:
After performing the import, SnowSQL will output some key information. This includes the name of the file, its size and the status of the import:
At this stage, our file has now been successfully imported into a secure and encrypted stage in Snowflake.
Preparing the Destination Table
Now that our data is secure in a Snowflake stage, we can think about how we want to view it. In order to load data from a stage into a table, we first must create the table and a file format to match our data.
The following steps could still be performed from within the SnowSQL command line interface; however, we will instead perform the remaining steps in Snowflake itself via the Worksheets functionality. We can do this because the remaining commands that we wish to execute can all be performed from within Snowflake, in contrast to the previous PUT command which could not.
To create our table, we first ensure that we are using the correct warehouse, database and schema. We can then use the CREATE TABLE command to create our table as we would in any SQL-compliant data warehouse:
Preparing the File Format
With our destination table created, the final piece of the puzzle before we load our data is to create a file format. As mentioned earlier, our data is stored as a vertical bar delimited CSV file:
Using the CREATE FILE FORMAT command, we can create a file format called CSV_VERTICAL_BAR. We determine that the file type is CSV, which instructs Snowflake to look for a structured text file with delimited fields. Our field delimiter is a vertical bar and skips the first row of data. We do this as the first line of data in our file contains the column names, which we do not want to import into our table in Snowflake:
Loading the Data from the Stage into the Table
Now that we have all of our pieces lined up, we can use the COPY INTO command to load our data from the stage into the destination table. As with all commands, full information can be found within Snowflake’s documentation:
The output informs us that the file loaded successfully, parsing and loading all 12 rows. We can verify this with a standard SELECT command from our table: