At InterWorks, we regularly build data pipelines for customers that include tools for migrating and transforming data such as Fivetran or Matillion. Tools such as these have connectors for many common sources of data to save us the hassle of building and maintaining our own scripted solutions.
Snowflake have developed their own set of connectors, available via their Marketplace for no extra cost, for MySQL, PostgreSQL, ServiceNow, Google Looker Studio, Google Analytics Aggregated and Google Analytics Raw data. If you want to bring data from these sources into Snowflake you now don’t need to pay any of the costs associated with running a separate tool.
I chose to focus on the Google Analytics Raw Data connector for this post because I was interested to see what extra information the new GA4 data provided over the old Universal Analytics data and was keen to avoid sampling and aggregation that had caused me issues in the past.
The Snowflake Connector for Google Analytics Raw Data brings through the events table (and there are different tables for the daily export “events_” and stream “events_intraday_” export types) and drops the API response from BigQuery into a single variant column. However, what is incredibly helpful is that the connector also creates views which parse the first level of key:value pairs into 108 columns, leaving any nested JSON as variant columns and providing a flattened copy of those nested JSON columns too. These views provide SQL statements I can copy and amend to return just the information I’m interested in.
See the example below of the tables and views created by the connector (BigQuery on the left, Snowflake on the right) where I’ve masked out the schema identifier from BigQuery:
You might’ve noticed that one table, “pseudonymous_users_,” isn’t retrieved by the connector. I spoke to Snowflake support who confirm it’s on their roadmap to include this table in a future release.
In order to monitor the cost of this ingestion I chose the option of creating a warehouse specifically to run the ingestion workload. This makes tracking the cost very easy in the cost management page:
I found the documentation for setting up the connector very thorough but in case it helps I’ll walk through how I set the connector up.
Preparing GA and BigQuery Platform
This diagram from the Snowflake documentation shows how your GA4 data gets to Snowflake and you can see BigQuery is involved here so we need access to Google Cloud platform for the google account we’re using to log into GA4 with:
Within google cloud platform there are a few configuration steps to walk through including:
- Create a project.
- Enable the BigQuery API and Cloud Resource Manager API within the APIs and services section for your project.
- Set up the BigQuery Link.
- Create a service account key.
- Configure OAuth for GCP.
Now, I began with these steps and realised later that the OAuth is an alternative option to using a service account. If you want to use OAuth, you require a URI only shown in the Snowflake Connector Configuration interface, so I needed to jump out of GCP at this point and start my installation of the connector within Snowflake.
Using your Account Admin role in Snowflake, navigate to “Data Products > Marketplace,” search for “Snowflake Connector for Google Analytics Raw Data” and click on “Get.” Once installed, you can click on the Configure button to show the configuration interface and, if you started like I did with the GCP setup, you can tick “mark as done” on the “Prerequisites” page:
On this page is the “Configure the OAuth consent screen and client ID for your GCP project” step. The URI here is what you need to copy into the “Authorised redirect URIs” within the OAuth 2.0 Client ID credential setup page in GCP. Once that credential has been set up you can copy the Client ID and Secret, as we’ll need those on the Authentication page of the connector config interface.
On the “Configuration” page of this interface, Snowflake will pre-populate a warehouse, database, schema and role for the connector to use. You can always select existing objects or, like I did for ease of typing, shorten the names of these new objects. There are a few options to track the cost of running the connector — Having a dedicated warehouse makes that easy:
The “Authentication” page asks either for your service account or the OAuth Client ID and Secret, so once those are entered, you move onto the “Validation” page where you can test your connection.
I received the following error because I set up my Google BigQuery Link to Google Analytics earlier the same day and set the Export Type to a daily download rather than a stream, so there was no data for Snowflake to validate the connection with:
I switched the export type to stream, checked in BigQuery that data was landing there, then hit the Retry button on the validation page. This was successful, so I moved on to setting up the data ingestion.
I landed on the Data Sync page after the configuration process, but if you need to return here, its under “Data Products > Apps > Snowflake Connector for Google Analytics Raw.” You’ll see a list of available properties (I only had one), so tick whichever you want to bring into Snowflake, then click the Start Sync button.
This page of the connector will tell you how recently the data was last synced and allow you to change how often new data is brought into Snowflake:
The Home page of the connector has a chart showing the number of rows ingested which you can amend to show different date periods and granularity:
That’s all I needed to do to get my raw Google Analytics data into Snowflake Data Cloud. I found it straightforward with ample documentation and required zero coding skills to set it up.