In the last few years, we have seen an ever-growing number of codeless data-prep solutions made available to analysts and developers. At InterWorks, we are constantly vetting these technologies, especially those that are extensible with powerful integrations with scripting languages like Python. Tableau Prep is among the tools we like, as it brings custom scripting, loops, predictive modeling and complex data transformations into the canvas when a user’s data needs surpass Tableau’s out-of-the-box functionality.
The Case for Using Python in Prep
Prep is designed for users to save time when performing basic transformations and cleaning of data sources – along with a way to document this work – all without leaving the Tableau ecosystem. Enter Python, the “Swiss Army Knife” of programming languages with over 100,000 libraries that can accommodate use cases ranging from text analytics and custom geocoding to quick API access and powerful machine-learning capabilities. In the list below, we are going to identify some of the clear benefits of using the powerful technologies of Prep and Python together:
- Brings the complex data transformation, looping and cleansing capabilities of Python into the Tableau ecosystem
- Works on an instance of TabPy that is better equipped to handle larger and more complicated outputs than in Desktop
- Leverages pandas (an intuitive, supple package) to containerize data inputs and outputs in the DataFrame structure
- Easy to run within the Prep canvas using CTRL + R or the Play button
- Gives users access to over 100,000 Python libraries for machine learning, text analytics and more
- Can be used to ingest data through files and API endpoints from outside of the Prep workflow
- Easily configurable to write multiple output fields in your dataset
Things to Know About Using Python in Prep
Prep is designed with simplicity and speed to value in mind. There are some basic requirements and considerations users should be aware of before diving into the deep end of using Python in Prep. Here are a few things to be aware of as you are getting started:
- Basic to Intermediate Python knowledge required, including use of pandas DataFrames, data types and functions
- Requires the installation of Python 3.6 or above and the TabPy server
- You will be required you to use a separate text editor or IDE like Jupyter to test and troubleshoot your scripts prior to using them in a Prep flow
- Error messages within the canvas lack critical details for troubleshooting
- Requires a get_output_schema function to be defined when writing new fields to an output in the Prep flow
- Requires using functions to manually assign data types to any new fields created in the script
- Scripts may slow down the speed at which a Prep flow runs
- Scripts cannot be used without an input connection present on the Prep canvas, requiring workarounds when referencing API endpoints in your script
When to Use TabPy in Desktop vs. Tableau Prep
Before the availability of Python in Tableau Prep, Tableau users have been able to use TabPy in their workbooks in Tableau Desktop since 2019. Much of the excitement around this feature came from developers who wanted to create fields within their workbooks that were results of an output from a Python model. Tableau Prep provides an alternative route to accessing these features outside of your workbook with the added benefit of dropping in pre-written .py files instead of writing code using the built-in SCRIPT_*( ) functions, but there are still some key considerations to make when choosing where to use Python in your Tableau workflow.
Using Python at the workbook level is a good idea when:
- Using Python scripts that require a dynamic user input
- Using Python models where you want to give users the ability to adjust parameters
- Calculating custom probability distributions on fields in your workbook
- Your model inputs depend on calculated fields in Tableau
- Your dataset is relatively lean
You should use Python in Prep when:
- You want your Python model outputs to be embedded in the dataset
- You are splitting your dataset into testing and training sets for modeling
- You have multiple distinct Python scripts you would like to apply to your data
- The output of your script contains more than one field
- You are doing row-level functions and loops
- You want to apply multiple Python transformations to a dataset
- You want to retrieve data from external sources / APIs and apply transformations using Tableau Prep
- You are working with multiple collaborators within a flow
Experience More with Python
Tableau Prep is a powerful tool suited for small to medium-size projects and those who want to keep all their tasks within the Tableau ecosystem. Using Python within Tableau Prep provides both a cost-effective alternative to data-prep tools like Alteryx and DSS and an effective solution for teams who want to bring more advanced data transformations and predictive-modeling capabilities into Tableau.
If you have questions about this, feel free to reach out to me directly via email or LinkedIn. I hope this was a helpful introduction to Python in Tableau Prep and gave you some ideas on how you can implement it in your next project.