Extracting Google Analytics Data for Tableau Reporting

Extracting Google Analytics Data for Tableau Reporting

Brian Carlton
//

Google Analytics is a great tool for keeping track of website traffic to see the effectiveness of your online marketing campaigns.  However, Google’s built-in tools aren’t always the best for visualizing and analyzing the data that it collects.  Therefore, it is best to extract that data and load it into Tableau to get the most worth out of Google Analytics’ website data.

Thanks to Google’s Analytics Core Reporting API, you can get access to that data directly.  I found a great program put out by Keplar, called “Google-Analytics-export-to-CSV” which does exactly that, (you can download  it here).  It does require Java Runtime Environment, so make sure you have it installed before you try to use it.

The program comes zipped together, so extract the files to the directory of your choice.  Once in that directory, you’ll notice that it comes with a readme as well as a PDF file explaining how to use the program (which I thought was rather well done)  Also, you’ll see two folders:  recipes and output.  Recipes contains .xml files that lay down all the necessary information for the data pull-down (login information and the metrics/dimensions you want)  These are predefined templates designed by Keplar; you can adjust them to suit your own needs. Or you can use the file called “query template” which leaves all the fields blank.  Here’s the visitors-by-page.xml file (open this in the text editor of your choice):

   

        
{ENTER YOUR TABLE ID HERE}

        ga:pagePath

        ga:visits,ga:visitors,ga:pageviews

       

       

        -ga:visitors

        2011-10-01

        2012-01-10

       

       

   

   

        {ENTER YOUR GOOGLE ANALYTICS USERNAME HERE}

       {ENTER YOUR GOOGLE ANALYTICS PASSWORD HERE}

   

   

        google-analytics-export-to-csv

        https://www.google.com/analytics/feeds/data

   

 

If you read through each of these tags, you’ll see some that stand out:

-tableId:  this is the Google Analytics number given to each of the sites that you have tied to your account.  The best way to obtain that information is to use Google’s Data Feed Query Explorer.  The first field, ids, is where you’ll find what goes in this field.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Pick whichever website you want to analyze and type in that code between those tags.

    dimensions:  these are the qualities by which you’ll filter your data

    metrics:  these are your measures with which you can perform calculations

    You can see a list of all dimensions and metrics GA has to offer here.

    start-date and end-date:  naturally, these are the boundaries over which you’ll pull your data.  Note that both are in year-month-day (YMD) format. 

    userName: put in the account name associated with your GA account

    password: naturally, your password for your GA account

 

When you’ve put in all the fields, you are ready to run the program itself.  The actual java program is the file called ga2csv.0.1.jar (.jar stands for JAVA archive) The program is designed to run on a command line, so we’ll be using PowerShell for that.  You can access PowerShell by typing powershell.exe in the run window (for a great PowerShell cmd line primer, check this site out)

Navigate to the directory where you downloaded the program.  It might resemble something like this:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Now, to actually run the JAVA program, you will enter a command that fits this form:

    java -jar ga2csv.0.1.jar [yourqueryname.xml] [youroutputname.csv]

Obviously, you’ll replace yourqueryname.xml with the actual name of your query; the same goes for youroutputname.csv.  You can also put in a location relative to the program’s directory as well.  Here’s a more specific example:

    java -jar ga2csv.0.1.jar recipes/visitors-by-page.xml output/visitors-by-page-output.csv

Assuming you’ve configured visitors-by-page.xml correctly, this will pull all the data down into a .csv located in the output directory.  Now you have a .csv that you can pull into Tableau and do some serious data analysis.

 

***Two important caveats***

 

1)  If you rerun this, you’ll have to delete the preexisting .csv or pick a new name because the program cannot overwrite the preexisting file. 

2)  You may find yourself with a screen that looks not unlike this:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This is due to a heap error; the program is trying to pull so much data down that it doesn’t have the memory to handle it and crashes based on its current configuration.  To get around this problem, increase the max heap size by altering the command to look something like this:

    java -Xmx1g -jar ga2csv.0.1.jar recipes/visitors-by-page.xml output/visitors-by-page-output.csv

That extra parameter, -Xmx1g, tells the program to use 1 GB of RAM to get the job done, as opposed to its default 128 MB.  Another way you can encourage faster performance is also run the program multiple times over smaller time intervals.  I had to do that because I was pulling ~20k rows for each month.  Running as a single pull overwhelmed the program, even when I bumped up to 1 GB max heap.

This is a great tool for Google Analytics users, and Keplar is awesome for making the tool publicly available.  Data nerds and web marketing analysts: rejoice.  You can bring GA data into Tableau now.  My heart is smiling; how about yours?

Need Expert Help?

See Our Full Menu of Data Services

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!