After the amazing show Tableau Software put on for us at this year’s Tableau Customer Conference, the BI community was abuzz with excitement about the upcoming Tableau 8. Fortunately, we weren’t kept waiting long. Beta access is widely available at this point. My colleagues have already written about some of the great features in the version 8 elsewhere on our blog. In this post I’m going to focus on a powerful but lesser publicized feature in Tableau 8: the Tableau Data Extract API.
I’d be remiss if I didn’t admit that I’m a little bit late to the game here. I didn’t even realize the Tableau Data Extract API was available to Tableau 8 beta users until I saw Ryan Robitaille’s examples here. So, clearly I have some catching up to do.
Anyway, on with it.
The Tableau Data Extract API does exactly what it sounds like it does – it allows users to create extracts from arbitrary data sources outside of Tableau Desktop. As an interesting aside, it doesn’t even require that Tableau Desktop or Server be installed on the same machine. Everything you need to get to work building extracts is contained in a zip file on the Tableau beta download site. The TDE API is currently available in Python, C, C++ and Java flavors. Each has a 32bit and a 64bit version. I’ve elected to use Python while working with the TDE API because it’s the darling of the data analysis arena and pretty good at getting things done. I’m sure things look quite similar in the other languages if you want to go that route.
I didn’t manage to find any documentation for the API so I’m working from what I could figure out from the comments in the Base.py file in the API package and the examples that Ryan posted.
Generally speaking the usage of the API looks like this:
- Import API modules (as tde in this example)
- Create an Extract object with tdeFile = tde.Extract(tdeFileName)
- Create a Table Definition object with tdeTableDef = tde.TableDefinition()
- Add column definitions to the Table Definition object with tdeTableDef.addColumn()
- Add a Table to the Extract with tdeTable = tdeFile.addTable(“Extract”,tdeTableDef)
- Create rows with tdeRow = tde.Row(tdeTableDef)
- Add column values to row with row.set
(colnum, value) ex: row.setInteger(c, v) - Insert row into tableu with tdeTable.insert(tdeRow)
- Close row with tdeRow.close()
- After all rows are inserted close the new TDE with tdeFile.close()
Let’s look at an example putting this together to tackle the task of importing from a CSV.
############################################################################### ## ## csvload.py ## ## Example import from csv to tde using Tableau v8 tde api ## Schema is SuperStore Orders sample data. ## ## USAGE: python csvload.py SuperStoreOrders.csv ## ## Brian Bickell / @brianbickell / InterWorks / 12.06.2012 ## ############################################################################### from sys import argv import os, csv, datetime, time import dataextract as tde ## Functions # This function makes adding the columns to each row in the extract a bit easier. def add_tde_col(colnum, row, val, t): # Date format used below dateformat = '%m/%d/%Y' if t == tdeTypes['INTEGER']: try: convert = int(val) row.setInteger(colnum, convert) except ValueError: #if we bomb the cast then we just add a null row.setNull(colnum) elif t == tdeTypes['DOUBLE']: try: convert = float(val) row.setDouble(colnum, convert) except ValueError: row.setNull(colnum) elif t == tdeTypes['BOOLEAN']: try: convert = int(val) if convert > -1 and convert ' % (index, k, tdeTypes.keys()[tdeTypes.values().index(v)]) tdeTableDef.addColumn(k, v) # Add table to extract tdeTable = tdeFile.addTable("Extract",tdeTableDef) print 'Writing records to %s' % (tdeFileName) # iterate through rows and columns of csv -> add to tde rownum = 0 for row in reader: if rownum == 0: header = row else: colnum = 0 tdeRow = tde.Row(tdeTableDef) for col in row: if colnum+1 > len(csvSchema): # don't blow up if you don't take every column break add_tde_col(colnum, tdeRow, row[colnum], csvSchema[colnum].values()[0]) # the magic colnum += 1 tdeTable.insert(tdeRow) tdeRow.close() rownum += 1 print '%i rows added in total in %f seconds' % (rownum-1, time.clock()-startTime) tdeFile.close() csvFile.close()
In the above example we’re taking the name of a CSV as an argument and loading into a CSV reader object, setting up our TDE bits, iterating over the rows and columns of that CSV, and finally storing into the new TDE. If you look carefully you can spot all 10 of the steps that I listed above. The CSV handling here isn’t as robust as I’d like it to be, but it’s passable. You probably noticed that I’ve defined the schema of the CSV as a list of dictionary objects. Making this dynamic is an exercise I’m leaving for a later time. As configured the file will load the Superstore Orders dataset provided in the Tableau Desktop install.
Additionally, I’ve added some extra bits to make working with the API a little bit more friendly.
First, I added a dictionary called tdeTypes that maps the integer keys the API uses for it’s data types to more friendly strings. Second, I added a function to help with the set
Despite having to sharpen up my Python skills to get this implemented, working with the Tableau Data Extract API was mostly painless. If I had to ask for any additional features I’d probably ask for some kind of handy row.setX() method that would determine the strongest data type and handle type casting for the user. Python being the weakly typed language that it is doesn’t really jive with the current row.set
The ability to programmatically generate extracts has been an often-requested feature by many of my clients and I’m happy to see that Tableau Software has finally implemented it and that it works well in my initial tests. I look forward to all of the shenanigans interesting uses that the community will come up with in the coming months.