Zero to Snowflake: Importing Data with Code via SnowSQL

Data

Zero to Snowflake: Importing Data with Code via SnowSQL

This series takes you from zero to hero with the latest and greatest cloud data warehousing platform, Snowflake.

Note: This blog was updated in October 2023 with minor updates.

Our previous post in this series explained how to import data via the Snowflake user interface. You can also stage your data and load it into tables with code via SnowSQL, which is what this blog post will demonstrate.

Setting the Stage for SnowSQL

There are many ways to import data into Snowflake, and utilising code allows users to automate the process. To import data into a Snowflake stage using SnowSQL, the following requirements must be met:

  1. SnowSQL must be installed on the user’s machine
  2. The data must be in a suitable file format
  3. A warehouse in Snowflake must exist to perform the import
  4. A destination database must exist in Snowflake
  5. A destination stage must exist within the Snowflake database

To load data from a Snowflake stage into a table, the following additional requirements must be met:

  1. A destination table must exist within the Snowflake database
  2. A suitable file format must exist within the Snowflake database

We will achieve all of the above within this article.

Installing SnowSQL

SnowSQL is the command line interface tool which allows users to execute Snowflake commands from within a standard terminal. This means users do not need to be within the Snowflake UI to prepare and execute queries. It is also noteworthy that certain commands cannot be executed from within the Snowflake Worksheets environment but can be executed via SnowSQL, with the most noteworthy being the command to import data into a Snowflake stage.

Before we can start using SnowSQL, we must install it. You can do so by navigating to the SnowSQL Download page. Once the installer downloads, execute it and run through the installer to install SnowSQL. You may need to enter admin credentials to install the software.

Verifying the Installation of SnowSQL

Once SnowSql is installed, it can be called from within any standard terminal. For this example we will be using Windows Powershell however we could equivalently use any other terminal depending on the operating system.

Begin by executing the command snowsql to verify the installation. This will list the various options which can be used within SnowSQL.

Preparing Data for Import

Before we can import any data into Snowflake, it must first be stored in a supported format. At time of writing, the full list of supported is contained in the table below. An up to date list of supported file formats can be found in Snowflake’s documentation.

Structured/Semi-Structured  Type  Notes
Structured Delimited (CSV, TSV, etc.) Any valid single-byte delimiter is supported; default is comma (i.e. CSV).
Semi-Structured JSON
Avro Includes automatic detection and processing of Snappy-compressed Avro files.
ORC Includes automatic detection and processing of Snappy or zlib-compressed ORC files.
Parquet Includes automatic detection and processing of Snappy-compressed Parquet files.
XML Supported as a preview feature.

For this example, we will load the following structured data which is delimited with vertical bars.

Region|Category|Sales
North|Hardware|25000
East|Hardware|50000
South|Hardware|50000
West|Hardware|75000
North|Office Supplies|10750
East|Office Supplies|15000
South|Office Supplies|19250
West|Office Supplies|38500
North|Clothing|2000
East|Clothing|3000
South|Clothing|3500
West|Clothing|4750

We store this inside a file called “Sample CSV Data.csv”:

Using SnowSQL to Execute Commands

The first step towards utilising SnowSQL is to establish a connection with your Snowflake environment. Three pieces of information are required to establish a connection:

  1. The Snowflake account to connect to, including the region
  2. An existing user within that account
  3. The corresponding password for that user

As Snowflake is a cloud service, each account is accessed via a URL. This URL is usually of the following format:

https://<Organisation>-<Account-Name>.snowflakecomputing.com

For example:

https://myorganisation-my-account.snowflakecomputing.com

Legacy Snowflake accounts can also be accessed by referencing the account and location directly:

https://<Account>.<Account_Region>.snowflakecomputing.com

For example:

https://my-account.uk-south.azure.snowflakecomputing.com

If you already have a Snowflake account, you should see one of the above forms matches the URL that you use to access the Snowflake UI.

The URL above is actually a wrapper around the Snowflake account identifier. Everything between https:// and .snowflakecomputing.com is the account identifier, as in the following examples:

<Organisation>-<Account-Name>
myorganisation-my-account

<Account>.<Account_Region>
my-account.uk-south.azure

We can leverage this account identifier to authenticate SnowSQL into Snowflake. Note that we are still using Windows PowerShell however this can be achieved in any equivalent terminal. We enter account using the --accountname option in SnowSQL, along with my user chris.hastie with the --username option.

snowsql --accountname myorganisation-my-account --username my_user

We can also use the shorter names for these options, respectively -a and -u

snowsql -a myorganisation-my-account -u my_user

In both situations, a password prompt will appear. Enter the password for the desired user. Do not be alarmed if your terminal does not display the password, it will still be reading the input.

With the connection established, SnowSQL will output the current user along with the current warehouse, database and schema. We have not yet selected any, which is why they are all left as none selected.

Preparing a Warehouse

We encourage Snowflake users to test their data loads with various warehouse sizes to find their optimum balance between credit cost to performance. As we are loading a very small file in a bespoke instance, we will use an extra small warehouse called “LOAD_WH”.

If the warehouse does not exist, it can be created with the CREATE WAREHOUSE command. Note that we have determined the size as extra small to keep this as low-cost as possible. We could also configure other options such as the AUTO_SUSPEND if desired.

create warehouse WH_LOAD warehouse_size = 'X-SMALL';

Upon successful execution, SnowSQL will output that the warehouse has been created.

Often, Snowflake will automatically switch the environment to leverage the new warehouse. If it does not, or if you already have a warehouse that you wish to use without creating it again first, leverage the USE WAREHOUSE command.

use warehouse WH_LOAD;

We can now see our current warehouse in our connection string, directly after the user name.

Preparing a Database

In a similar fashion to warehouses, users can create databases via the CREATE DATABASE command. For this example, we will create a databased called DB_DEMO.

create database DB_DEMO;

With this database created, our connection will automatically update to use this database. DB_DEMO is now provided in the connection string, along with the default PUBLIC schema.

For the sake of uniformity, we will also use the CREATE SCHEMA command to create a new schema called DEMO_SCHEMA.

create schema DEMO_SCHEMA;

Our connection string has now updated to display that we are connected to the DEMO_SCHEMA schema within our DB_DEMO database.

Preparing a Stage

Stages are Snowflake’s version of storage buckets. Each stage is a container for storing files, and can be either an internal stage within Snowflake or an external stage within Amazon S3 or Azure Blob storage.

For this example, we will create a simple internal stage within Snowflake using the CREATE STAGE command. Our stage will be called DEMO_STAGE.

create stage DEMO_STAGE;

If our data was stored in cloud storage, such as Amazon S3 or Azure Storage, we would require authentication credentials to the storage location along with the URL. For fast results, these can be entered as part of the CREATE STAGE command, as in the following example.

create stage DEMO_S3_STAGE
  URL='s3://bucket/path/to/files/'
  CREDENTIALS=(AWS_KEY_ID='1a2b3c' AWS_SECRET_KEY='4x5y6z')
;

create stage DEMO_AZURE_STAGE
  URL='azure://account.blob.core.windows.net/container/path/to/files/'
  CREDENTIALS=(AZURE_SAS_TOKEN='my-sas-token')
;

In general when interacting with cloud storage, InterWorks would recommend leveraging storage integrations instead of storing credentials inside the stage. This is covered in the following two articles:

Finally we can import our data into our stage using the PUT command. More information for this command and other SnowSQL commands can be found in Snowflake’s documentation. A key piece of information is that the PUT command can only be executed from outside the main Snowflake interface, which is why we are using SnowSQL instead.

In general, the PUT command takes the following structure:

put 'FILE://path/to/file' '@STAGE_NAME/path/to/stage subdirectory/';

There are additional options that you can leverage if desired, such as OVERWRITE = TRUE to overwrite any existing file(s) of the same name.

It is important to get the file location string correct as this may vary depending on the terminal you are using. Note that the stage is determined using an @ sign and we wrap both the file path and the destination reference inside pair of single quotation marks ' ' as this allows us to include spaces if desired.

For our example, we simply wish to upload our “Sample CSV Data.csv” file into the root of our stage:

put 'FILE://Sample CSV Data.csv' '@DEMO_STAGE';

After performing the import, SnowSQL will output some key information. This includes the name of the file, its size and the status of the import.

At this point in our journey, our file has now been successfully imported into a secure and encrypted stage in Snowflake.

Preparing the Destination Table

Now that our data is secure in a Snowflake stage, we can think about how we want to view it. In order to load data from a stage into a table, we first must create the table and a file format to match our data.

The following steps could still be performed from within the SnowSQL command line interface, however we will instead perform the remaining steps in Snowflake itself via the Worksheets functionality. We can do this the remaining commands that we wish to execute can all be performed from within Snowflake, in contrast to the previous PUT command which could not.

To create our table, we first ensure that we are using the correct warehouse, database and schema. We can then use the CREATE TABLE command to create our table as we would in any SQL-compliant data warehouse.

use warehouse WH_DEMO;

use schema DB_DEMO.DEMO_SCHEMA;

create table DEMO_TABLE (
    REGION string
  , CATEGORY string
  , SALES integer
)
;

Preparing the File Format

With our destination table created, the final piece of the puzzle before we load our data is to create a file format. As mentioned earlier, our data is stored as a vertical bar delimited CSV file.

Using the CREATE FILE FORMAT command we can create a file format called CSV_VERTICAL_BAR. We determine that the file type is CSV which instructs Snowflake to look for a structured text file with delimited fields. Our field delimiter is a vertical bar and skip the first row of data. We do this as the first line of data in our file contains the column names, which we do not want to import into our table in Snowflake.

create file format CSV_VERTICAL_BAR
  type = CSV
  field_delimiter = '|'
  skip_header = 1
;

Loading the Data from the Stage into the Table

Now that we have all of our pieces lined up, we can use the COPY INTO command to load our data from the stage into the destination table. As with all commands, full information can be found within Snowflake’s documentation.

copy into DEMO_TABLE
from @DEMO_STAGE
  file_format = CSV_VERTICAL_BAR
;

The output informs us that the file loaded successfully, parsing and loading all 12 rows. We can verify this with a standard SELECT command from our table.

select * from DEMO_TABLE;

More About the Author

Chris Hastie

Data Lead
Configure Azure Private Endpoints for Snowflake Internal Stages Snowflake and Azure are incredibly capable of storing data securely. You can see all reports available to customers through their pages ...
Configure Azure Private Link Connectivity with Snowflake Snowflake and Azure are incredibly capable of storing data securely. You can see all reports available to customers through their pages ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!