As 2020 was coming to a close at InterWorks, we were approached with a unique problem from an organization we greatly respect. Amazon has been at the forefront of cloud computing for the better part of my career, and in recent years, they have been revolutionizing the machine learning (ML) space. Machine learning has matured quite a bit, and recently its development is very similar to the shift that data warehousing has encountered: simple is better.
In an effort to simplify machine learning for organizations large and small, Amazon partnered with InterWorks to help bridge the worlds of Tableau and Amazon SageMaker through the development of Amazon SageMaker for Tableau. The product culminates our expertise in data management, Tableau and Cloud frameworks to deliver advanced analytics directly inside your Tableau dashboards, truly democratizing machine learning insights.
What Is Amazon SageMaker?
For organizations that are already investing in ML, the Amazon SageMaker for Tableau integration provides a unique opportunity to communicate and share ML with business users and executives. Instead of locking these valuable opportunities into programmatic interfaces, we can use the connector to share insights in a way that organizations have learned to rely on – effective, thoughtful, visualized. Data analysts empowered with predictive analytics from SageMaker in Tableau unlock opportunities by discovering the next purchasing trend, where vital resources should go and what customer churn will look like, all within Tableau Desktop by your organization’s best storytellers. The Amazon SageMaker Tableau integration gives us a unique lens into the world of machine learning and allows businesses to truly do more with our data.
Amazon SageMaker for Tableau has been open-sourced as an AWS Quick Start and is completely free to everyone. Organizations are able to leverage the integration within their AWS environments and deploy the solutions resources with a fully managed Cloud Formation template. Utilizing Tableau’s Analytics Extension Framework, the Quick Start deploys a serverless application comprised of Amazon services like AWS Lambda, Amazon API Gateway and Amazon Cognito. To deploy the Quick Start, customers need an AWS account and a domain registered with Amazon Route 53:
The Quick Start leverages the same framework as Tableau’s TabPy integration; however, it improves the user experience by using serverless technology to remove the need for traditional maintenance or custom installations. The architecture allows users to launch, sign up and begin deriving insights from their ML models in under one hour without any need for server configuration. After launch, Tableau developers are able to bring enriching ML insights into their calculated fields, expanding the way organizations can use and derive value from their data.
How to Use Amazon SageMaker for Tableau
Prior to using Amazon SageMaker for Tableau, end users need to have trained and deployed a SageMaker and should also have it deployed to Amazon SageMaker Hosting Services. Ideally, you have used Amazon Autopilot to train your model, which removes the need to implement a preprocessing inference pipeline into the ML model, and the Quick Start will work perfectly right out of the box.
To use Amazon SageMaker for Tableau, customers need to complete the following activities:
- Deploy Amazon SageMaker for Tableau into your AWS environment using the AWS Quick Start
- Provision access to the Quick Start solution for your users
- Connect from Tableau and begin authoring visualizations
- Use the Amazon SageMaker model endpoints your data science team is using to host your ML models, or leverage Autopilot to create ML models quickly
Deploy the SageMaker Tableau Integration to AWS
We provided an AWS Quick Start so that our customers can easily launch this solution. The Quick Start will automatically provision the AWS resources that will enable you to use Amazon SageMaker ML models in Tableau. Please follow the deployment guide hosted along with the Quick Start for a step-by-step guide on deploying the solution.
Provision User Access to the Solution
Once the Quick Start has deployed the AWS CloudFormation stack into your environment, you will need to create users in Amazon Cognito that allow your end users to connect to SageMaker. In the AWS CloudFormation Outputs tab, navigate to the URL for the UserPoolDomain. Additionally, while you are here, take note of the value for SageMakerTableauApi – you will need this in the next step:
The URL presented in the UserPoolDomain output will allow you to sign up as a user or sign in to validate that the user is created and registered with Amazon Cognito. Once you successfully register/log in, you will be redirected to the GitHub page for the Quick Start solution. At this point, you can go to Tableau and begin utilizing Amazon SageMaker for Tableau.
Connect from Tableau
Tableau Desktop
It is recommended to run Amazon SageMaker for Tableau with Tableau Desktop version 2020.1+. After opening Tableau, navigate to help Menu, Settings & Performance, and select Manage Analytics Extension Connection. In the configuration pane, submit the following details:
- Server: TabPy/ External API
- Server: Value you got from your CloudFormation output
- Port: 443
- Sign in with a username & password: Yes
- Require SSL: Yes
After configuring your connection details, select Test Connection to confirm the extension is accessible. Once you have successfully established connectivity to your deployment of Amazon SageMaker for Tableau, we can navigate to a worksheet to begin authoring visualizations that leverage Amazon SageMaker.
Create a calculated field using the SCRIPT_ function to call the analytic extension. The calculated field should use the SCRIPT_ function in Tableau’s function library. The SCRIPT_ functions correspond to the return data type of the ML model, and the syntax of the integration is as follows: SCRIPT_STR(‘sagemaker hosted endpoint name’, list of fields to pass to model):
Tableau Server
For Tableau Server configuration, follow the documentation laid out by Tableau here.
Use the SageMaker Model Endpoints Deployed by Your Data Science Teams
Now that you have deployed the Quick Start, you are able to connect to any Amazon SageMaker model endpoint in your AWS account. Using IAM policies, you can also limit the models accessible if that is desired.
If you do not already have ML models hosted on Amazon SageMaker, you should work with your data science teams to build and deploy your ML models on Amazon SageMaker. Again, we recommend using Amazon SageMaker Autopilot, a fully managed tabular AutoML service. Autopilot automates the ML experimentation process to create optimized ML models from tabular datasets. Our solution integrates with Autopilot trained models out of the box. Therefore, this is your fastest path to realizing value.
I highly encourage anyone reading this who has not dipped their toes in the water of ML to check out this documentation from AWS about how to run Autopilot. This will give you some understanding of how this process differs from standard reporting or data engineering. It really makes the tasks approachable and will set you down the right path towards gaining predictive insights that you can use later in Tableau! Machine learning is an advanced workload, and I would be remiss not to call out that – for accurate predictions – it is best to leverage folks at your organization with data science or ML backgrounds to develop models for use in production Tableau dashboards.
Tips & Tricks for Building Visualizations with Machine Learning
Authorizing visualizations that leverage ML models can be slightly different from traditional Tableau dashboarding. We have come up with a few tricks to really get the most out of the SageMaker for Tableau integration:
- Use unique identifiers or primary keys on the Detail mark in your visualizations
- Edit the default table calculation to compute using the unique identifier
These tips are to ensure your visualization passes data to the machine learning model correctly, as well as improves performance of the integration. By taking a unique identifier and placing it on a mark in your visualization, it ensures that the SageMaker integration receives specific rows of data, not aggregates. To avoid this behavior modifying the visualizations layout or appeal, a great trick is to take your unique row identifier and drop it on the Detail pane in the Marks card. This will produce the correct functionality without changing your charts! Under the hood, this change tells Tableau to automatically partition your table calculations to the granularity of the field on your Detail pane (in our example, a primary key), ensuring the ML model does not receive aggregated results. This is extremely important because sending aggregated results to your ML model could accidentally produce incorrect results that are worthless:
Editing the table calculation that connects to SageMaker allows you to control the way data is sent to the integration and improve performance. To ensure you don’t get stuck waiting for details you don’t need, we recommend pausing auto updates on the worksheet, dragging your calculated field into the viz, editing the table calculation to compute using the unique identifier you set in the last step, submitting, and then resuming auto updates. A detailed walkthrough of this process is included near the end of the demonstration video above:
Once you have configured your worksheet, these insights from your ML models are available front and center in your Tableau dashboards to provide predictive insights to your reporting audience:
Get Started Today!
Bridging the gap between advanced and visual analytics with the SageMaker integration allows organizations to truly democratize machine learning and provide advanced analytics to new audiences. Developing this solution with our partners at AWS and Tableau has been an absolute blast, and we cannot wait to see the beautiful creations you all come up with.
You can launch Amazon SageMaker for Tableau into your AWS account today by navigating to the Quick Start page here.
Feel free to reach out to InterWorks for advice on how to get up and running with Amazon SageMaker for Tableau or for any additional help getting the Quick Start to fit your organization’s needs. Happy building!