Introducing the Snowflake Data Cloud: Data Applications

Data

Introducing the Snowflake Data Cloud: Data Applications

This series highlights how Snowflake excels across different data workloads through a new cloud data platform that organizations can trust and rely on as they move into the future.

Snowflake’s cloud-first architecture has unlocked access to data in a way that was previously impossible. For the first time in history, data is accessible, affordable and fast. As users began to receive tangible value from their data-warehousing efforts, it quickly became clear that this data is not just something to be used for metrics and reporting. Data is an asset, and providing access to that asset through internal and external applications can give organizations a leg up on their competition. With Snowflake at the center of your data architecture, it makes sense to leverage your data in more ways than just a dashboard. You might have customer data that needs to make it to your marketing systems, marketing data that needs to get into your financial systems or financial data that needs to get into your human resource systems.  

Getting Creative with Snowflake Solutions

Snowflake wisely recognized that the diversity in a modern systems world would require a diverse set of ways to meet the needs of customers. Snowflake provides a number of connection options that allow various tools and systems to interact with your centralized data. The interesting thing is how you can leverage all the ways Snowflake can integrate with the other systems to create a solution you might not have normally considered. 

For example, one of our clients approached us with a request to provide data to an external vendor. Our client is also a Snowflake customer. You might be thinking, “That’s an easy one; just create a data share!” And I would agree. A data share would be the best solution. However, not every vendor or service is ready to accommodate simple and secure data sharing with Snowflake. But maybe someday. Until then, we can use a ReST API to approximate this type of application interaction. To create the ReST API, we will use API Gateway, a Lambda function and Snowflake as our backend to return data requested via the API. Sounds simple, right? Well, buckle up! 

Setting up Snowflake

To get the ball rolling on this, first we need to create a dataset that will be usable by a ReST API. In this case, we’ll want to structure some data into a valid JSON response. Fortunately, Snowflake makes this really easy with its ability to natively handle and create semi-structured data. We can do this all out of the box with a simple SQL function. Let’s take a quick look at how to do this. 

Let’s create a small table with the following schema and values: 

CREATE DATABASE test_db; 
 
USE SCHEMA public; 
 
CREATE TABLE primary_table ( 
    message_id NUMBER 
    ,message_text VARCHAR 
    ,message_time DATETIME 
); 
 
INSERT INTO primary_table VALUES  
(1, 'hello, world', '2020-01-01 00:00:00') 
,(2, 'hola palabra', '2020-01-01 00:00:01') 
,(3, '你好,单词', '2020-01-01 00:00:02');

Then let’s structure the data into a view that looks like JSON: 

CREATE VIEW api_response AS 
SELECT 
OBJECT_CONSTRUCT( 
  'data', ARRAY_AGG( 
    OBJECT_CONSTRUCT( 
      'id', message_id 
      ,'text', message_text 
      ,'timestamp', message_time 
    ) 
  ) 
) as data 
FROM primary_table;

Great! Now we have data our API can work with: 

{ 
  "data": [ 
    { 
      "id": 1, 
      "text": "hello, world", 
      "timestamp": "2020-01-01 00:00:00.000" 
    }, 
    { 
      "id": 2, 
      "text": "hola palabra", 
      "timestamp": "2020-01-01 00:00:01.000" 
    }, 
    { 
      "id": 3, 
      "text": "你好,单词", 
      "timestamp": "2020-01-01 00:00:02.000" 
    } 
  ] 
}

Lambda 

First, we are going to create the Lambda function, and to do this, we’ll create a simple Python function that uses Snowflake’s Python connector to query our data. This can be a little tricky since the Python connector is not an out-of-the-box library accessible by our Lambda function. In the past, when creating a deployment package for Lambda, you would have to zip up all your code and required libraries into a single .zip and import that into the function. However, the approach I recommend now is to utilize Lambda layers. These are also .zip packages but can be reused by any number of functions, and they don’t include your actual function code. 

We can create the layer by installing the Python connector and then create a .zip package with just that library. 

To help with local environment variability, I’m going to leverage a Docker container to install the connector and zip the contents of the site-packages directory. 

To start, we’ll need a Dockerfile: 

FROM amazonlinux:latest 
 
RUN yum install -y \ 
    bzip2-devel \ 
    gcc \ 
    gdbm-devel \ 
    git \ 
    libffi-devel \ 
    openssl \ 
    openssl-devel \ 
    readline-devel \ 
    tar \ 
    vim \ 
    zip \ 
    zlib \ 
    zlib-devel 
 
RUN curl -o Python-3.8.2.tgz https://www.python.org/ftp/python/3.8.2/Python-3.8.2.tgz \ 
    && tar -xzvf Python-3.8.2.tgz \ 
    && cd Python-3.8.2 \ 
    && ./configure \ 
    && make \ 
    && make install \ 
    && cd .. \ 
    && rm Python-3.8.2.tgz \ 
    && rm -rf Python-3.8.2 \ 
    && python3 -m venv /tmp/env 
 
RUN source /tmp/env/bin/activate \ 
    && pip install snowflake-connector-python \ 
    && mkdir -p python/lib/python3.8/site-packages \ 
    && cp -r tmp/env/lib/python3.8/site-packages/* python/lib/python3.8/site-packages/ \ 
    && zip -r snowflakeconnector.zip ./python/

Create a new file in your favorite text editor and copy the contents from above. Then save the file as simply Dockerfile in a location you’ll remember. This will install the Python3.8 runtime from an archive file and creates a virtual environment to make sure we’ve got a squeaky-clean base from which to work. 

Next, you need to build your image. Make sure your current directory is where you’ve saved your Dockerfile: 

docker build --tag amazonlinux:snowflake-python3.8 .

Pay attention to the dot at the end of that command. That tells Docker to look for a Dockerfile in the current directory. This will also tag the image with a REPOSITORY of amazonlinux and a TAG of snowflake-python3.8. 

Next, start your image in a container and name it mydockercontainer:

docker run --rm -it --name mydockercontainer amazonlinux:snowflake-python3.8 bash

You should have a new shell inside the running container at this point. 

Next, we’ll use the AWS CLI to create the Lambda layer and upload our snowflakeconnector.zip file. 

First, make sure the AWS CLI is installed. To do that, we’ll need to activate our Python virtual environment to get started: 

source /tmp/env/bin/activate

Now, we’ll install the AWS CLI: 

pip install awscli

And configure it: 

aws configure

You’ll need an IAM user with access and secret keys that have the appropriate permissions to continue. Plug in those details at the prompt. You don’t need to specify a default output format: 

AWS Access Key ID: 
AWS Secret Access Key: 
Default region name: 
Default output format:

Awesome. Now we can create our AWS resources. Let’s create the Lambda layer our function will use: 

aws lambda publish-layer-version \ 
--layer-name snowflakeconnector \ 
--compatible-runtimes python3.8 \ 
--zip-file fileb://snowflakeconnector.zip

Copy and paste that into your shell. You should get a response back that shows, among other things, the LayerVersionArn. Take note of that because we’ll need it when we create our function. 

Now we need our function code: 

import json 
import snowflake.connector 
from snowflake.connector import DictCursor 
 
def lambda_handler(event, context): 
    ctx = snowflake.connector.connect( 
        user='username', 
        password='password', 
        account='mysnowflakeaccount' 
        ) 
    cs = ctx.cursor(DictCursor) 
    try: 
        result = cs.execute("select data from test_db.public.api_response") 
        response_data = result.fetchall()[0]['DATA'] 
        response = {} 
        response['statusCode'] = 200 
        response['body'] = json.dumps(json.loads(response_data)) 
        return response 
    finally: 
        cs.close() 
    ctx.close()

Create an empty file called lambda_function.py and copy the above code into that file. Next, we’ll copy that file from our host machine into our Docker container: 

docker cp lambda_function.py mydockercontainer:lambda_function.py

Open a new terminal and run the above command. You can switch back to your Docker container shell and just verify the file was copied where you expected: 

ls -l | grep lambda_function.py

Additionally, we’ll need to zip the function code, so we can use the AWS CLI to create our function and upload the code in one go.  

In your Docker shell, run the below command to create a new file called lambda_function.py.zip :

zip lambda_function.py.zip lambda_function.py

Now, we can create our function and upload our code with the attached layer we created previously. Note that you’ll need to make sure you have an appropriate role for the Lambda function to run under: 

aws lambda create-function \ 
--function-name myfunction \ 
--runtime python3.8 \ 
--role arn:aws:iam::account:role/mylambdarole \ 
--handler lambda_function.lambda_handler \ 
--zip-file fileb://lambda_function.py.zip \ 
--layers arn:aws:lambda:region:account:layer:snowflakeconnector:1

Note the FunctionArn in the response body. We’ll need to use that when creating our API Gateway. Let’s run a quick test to make sure we get the data back we’re expecting: 

aws lambda invoke \ 
--function-name myfunction \ 
response.json

Which should return the following: 

{ 
    "StatusCode": 200, 
    "ExecutedVersion": "$LATEST" 
}

Then check the actual Snowflake data in the response.json file: 

cat response.json
{"statusCode": 200, "body": "{\"data\": [{\"id\": 1, \"text\": \"hello world\", \"timestamp\": \"2020-01-01 00:00:00.000\"}]}"}

API Gateway

Now, we’ll create the API Gateway with our Lambda integration: 

aws apigatewayv2 create-api \ 
--name 'myapi' \ 
--protocol-type HTTP \ 
--target arn:aws:lambda:region:account-id:function:myfunction

Look for the ApiEndpoint in the response. We can combine that with the name of our Lambda function to get our endpoint: 

https://api-id.execute-api.region.amazonaws.com/myfunction

This should result in a 500 error. That’s because the API Gateway hasn’t been completely configured to use the Lambda function. To complete this step, you’ll have to jump into the Lambda console. Select your Lambda function and then select Add trigger. Select API Gateway and the name of your API. For Deployment stage, use $default and for Security, select Open then click Add. 

This should attach the API Gateway to the Lambda function. Now would be a good time to test: 

curl --request GET 'https://api-id.execute-api.region.amazonaws.com/myfunction'

Hopefully, you’re getting the response back with the data from the view. 

Conclusion

Snowflake’s unique architecture allows us to provision access to reporting ready datasets for a wide variety of use cases. Looking at data applications as a method to consume data and derive answers really puts into perspective just how important it is to have all of your data in one place—a single source of truth. In this blog, we’ve demonstrated a simple data application use case with Snowflake. After putting an API in front of our Snowflake database, we can return data to end users and external applications. Obviously, there is quite a bit more we could do here to address securing the API and adding routes or parameters, which we have not covered here. You might also be asking, “Couldn’t this get expensive really quickly?”. Yes, it definitely could. If you are an organization who is hesitant to allow external applications to turn on and run your Snowflake warehouses from an API, an alternative deployment pattern would be to unload the data from Snowflake into S3 and have the Lambda function consume the data that way. There are other considerations that come along with that alternative, but it does at least keep your API users from consuming all your Snowflake credits. 

Looking at Snowflake’s history as a data warehouse and the features that come along with it, it is easy to see what makes Snowflake best in class. A winning formula comprised of unmatched performance, cloud-focused architecture, ease of use and intuitive pricing have all set the stage for where Snowflake can go next. Snowflake’s guiding principles position it to be the best-in-class solution for organizations looking to build data applications while harnessing the power of the cloud. 

The power of Snowflake’s architecture isn’t distinct to the data warehouse. It really is the catalyst to creating a data platform that organizations can trust and move forward with. I hope you’ve enjoyed this blog and found some valuable tips to apply in your work. Feel free to reach out if we can help you in any way, and be sure to check out the other posts in this series!  

More About the Author

Derek Hecksher

Data Lead
Implementing Advanced Capabilities in dbt After you’ve finished setting up your dbt environment and begun working in dbt, you may find that you’ll need more than just the ...
Setting up Your dbt Environment In our last post, we talked about the differences between dbt Cloud and dbt Core. Once you’ve decided which environment works best for ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!