Launching Your Serverless Expedition
If you are utilising or planning to migrate to Tableau Cloud and have your data or corporate infrastructure hosted on AWS and are seeking to deploy a containerised cluster of Tableau Bridge clients on a CentOS distribution, then you’re reading the right blog.
Keep in mind that every implementation, every business case, every user community is different. Platforms are interacted with in different ways, at different times, by different people. During this implementation, you’ll have to make some crucial decisions to ensure the solution is fit for your business.
In our cosmic experiment, we set sail with the inaugural Tableau Bridge version, the 2023.3 cosmic odyssey, launched on October 24, 2023. As we traverse this uncharted territory, remember to exercise due diligence and rigorous testing before embarking on your own production voyage
What is Tableau Bridge?
Tableau Bridge is a critical component of Tableau that acts as a bridge between on-premises or firewall-protected data sources and Tableau Cloud. It enables automated data refreshes, ensuring that reports and dashboards hosted on Tableau Cloud are always up to date with the latest data from a variety of sources. Tableau Bridge facilitates secure data transfer, making it possible to access on-premises data securely while maintaining data security and compliance. This feature also allows for real-time or near-real-time data synchronization, providing users with the most current information for analytics and reporting. Additionally, it can be configured for high availability, ensuring uninterrupted data access and refreshes even in the face of server failures, ultimately enhancing the flexibility and scalability of Tableau deployments in the cloud. Here’s a list of challenges associated with the conventional Tableau Bridge installed on Windows:
- Fixed Cost of Compute: With conventional Tableau Bridge, there is a fixed cost of compute resources, which can be inefficient and costly, especially when dealing with ever-changing demand on live data and extract refreshes. Organizations may find it challenging to align their infrastructure costs with varying usage patterns.
- Fault Tolerance: Ensuring high availability and fault tolerance in a Windows-based Tableau Bridge setup can be complex. Failures or disruptions in the bridge can lead to interruptions in data access and analytics, affecting critical business operations.
- Capacity Scalability: Scaling up the capacity of a conventional Tableau Bridge installation to meet increased demand often requires manual intervention and additional hardware provisioning. This process can be time-consuming and may not align well with dynamic business needs.
- Stability and Reliability: Maintaining stability and reliability in a Windows-based Tableau Bridge environment can be a challenge. Issues such as crashes, performance bottlenecks and unexpected downtime can impact the overall user experience and data availability.
- No Support for Linux: Conventional Tableau Bridge is primarily designed for Windows environments, limiting flexibility and interoperability with Linux-based infrastructure. Organizations that rely on Linux servers may face integration challenges.
- Manual Setup and Configuration: Setting up and configuring a conventional Tableau Bridge on Windows typically involves manual steps and ongoing maintenance efforts. This can be error-prone and time-consuming, requiring dedicated IT resources.
What is Containerization?
Containerization is a lightweight form of virtualization that allows you to package an application and its dependencies into a single unit called a container. Containers are isolated environments that encapsulate everything needed to run an application, including code, runtime, libraries and system tools. They provide consistency and portability, making it easy to deploy and run applications consistently across different environments, such as development, testing and production, without worrying about differences in underlying infrastructure.
What is AWS Fargate?
AWS Fargate is a serverless compute engine provided by Amazon Web Services (AWS) for running containers. With Fargate, you can deploy and manage containers without having to provision or manage the underlying infrastructure. It abstracts away the server and cluster management tasks, allowing you to focus solely on your containerized applications. Fargate automatically scales resources based on the needs of your containers, ensuring that you pay only for the compute resources you consume, making it a convenient and cost-efficient way to run containerized workloads in the AWS cloud.
Containerization and AWS Fargate offer several benefits when it comes to hosting Tableau Bridge:
- Scalability: Containerization allows you to easily scale Tableau Bridge instances up or down to match demand. AWS Fargate, as a serverless compute engine, automatically manages the scaling process, ensuring you have the right amount of resources without overprovisioning.
- Cost Efficiency: With AWS Fargate, you only pay for the compute resources you use, making it a cost-effective solution compared to maintaining fixed infrastructure for Tableau Bridge. Containers also have a smaller footprint than traditional virtual machines, optimizing resource utilization.
- Flexibility: Containers provide a consistent and portable environment for Tableau Bridge, enabling you to run it across different cloud platforms or on-premises. This flexibility can be crucial for organizations with multi-cloud strategies or hybrid deployments.
- Isolation: Containers encapsulate Tableau Bridge and its dependencies, ensuring isolation from the underlying host system. This isolation enhances security and minimizes conflicts between different software components.
- Easy Deployment: Containerization simplifies the deployment process. You can create and manage container images that include Tableau Bridge and all necessary configurations. Deploying new instances or updating existing ones becomes a straightforward process.
- Resource Efficiency: AWS Fargate optimizes resource allocation, allocating CPU and memory based on your specified requirements. This helps avoid over-provisioning and maximizes resource utilization.
- High Availability: AWS Fargate provides built-in high availability and fault tolerance, reducing the risk of downtime due to infrastructure failures. This is critical for ensuring uninterrupted Tableau Bridge operation.
- Automated Scaling: Fargate can automatically scale your Tableau Bridge containers in response to changes in workload, ensuring that you can handle peak usage periods without manual intervention.
- Easier Management: Container orchestration tools like Amazon ECS (Elastic Container Service) simplify the management of Tableau Bridge containers, including deployment, scaling and monitoring.
- Consistency: Containerization ensures consistent environments across development, testing and production, reducing the chances of configuration drift and making it easier to troubleshoot issues.
- Security: Containers can be configured with fine-grained access controls and network segmentation, enhancing the security of Tableau Bridge deployments.
- Resource Isolation: Containers provide resource isolation, preventing resource contention between Tableau Bridge instances and other applications running on the same infrastructure.
Incorporating containerization and AWS Fargate into your Tableau Bridge deployment strategy can lead to improved agility, cost savings, reliability and overall efficiency, making it a compelling choice for modernizing and optimizing your Tableau data integration and analytics processes.
Planning Your Expedition
What you’ll need is:
- AWS DevOps Access to create EC2, ECR, ECS
- Tableau Account to download software, or a copy of the Tableau Bridge install software
- Your Tableau Cloud Sire URI
- Your Tableau Cloud Pool ID
Note that you can obtain the Pool ID from clicking on the pool name from with Tableau Cloud Settings -> Pool -> Pooling page:
- Tableau Cloud Site Admin Account
Charting the AWS Cosmos — Setting up AWS Environment
Launch an EC2 instance. This will be used as your docker image builder, so, will not be left running and will only need to be used as you build, deploy or redeploy your docker images.
I used the latest AWS Linux 2 AMI on an m6i.2xlarge instance type with the standard 8GB volume attached. You could probably use a small Ec2 type, but I love the idea of attaching boosters to my spaceship.
Some basic system should include your typical software installs, this is what I used:
Install Docker:
sudo yum update -y sudo yum -y install docker sudo service docker start sudo systemctl enable docker sudo usermod -a -G docker ec2-user sudo reboot docker info
Install AWS CLI:
sudo yum -y remove awscli curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip sudo ./aws/install
Unleashing Docker’s Magic — Creating Docker Images
Create the Docker File
Using our basic linux commands, we can create a working directory and the Dockerfile. I’ve noticed that the filename is case-sensitive. An uppercase “D” is your ticket to interstellar success:
mkdir Docker cd Docker touch Dockerfile
Download the RPM File
With your trusty Tableau Account as your navigational key, journey to the Tableau Bridge site and secure the Tableau Bridge RPM–your interstellar ticket.
Once you possess this cosmic gem, make your move, copy or summon it to the Docker directory, your launchpad for the voyage into the data universe.
Create the PAT Token File
In the celestial vaults of Tableau Cloud, secure your PAT Name and PAT Secret key—the keys to the data galaxy. These cosmic artifacts shall be inscribed in a file of your choosing. In my odyssey, I named it “TokenFile.json,” safeguarded within the sacred Docker directory. The secrets within the file are of this cryptic format:
{ "PAT Name" : "PAT Secret Key" }
Here’s where you get to make your first major decision: Do you embed the Token File into the docker image or find another way to dynamically create the file at runtime? In this example, I have found it quite simpler to embed the file into the image.
Edit the Dockerfile
Using the built-in vi tool, I added my docker commands to my docker file:
# Centos 7 is the supported base image FROM centos:7 RUN yum -y update # Copy the bridge RPM package, install it, then remove it from the image COPY TableauBridge-< YOUR VERSION>.x86_64.rpm . RUN ACCEPT_EULA=y yum install -y $(find . -name *.rpm) && rm *.rpm # Copy the Token File to working directory COPY TokenFile.json .
Build a New Image
docker image build -t bridge_base . docker images --filter reference=bridge_base
Harness the Power of Datasource Drivers
In our cosmic voyage, Tableau thrives on an array of datasources, from the traditional MS SQL Server and Oracle RDMS to the modern cloud realms like Snowflake and Amazon Redshift. To ensure our Tableau Bridge ships sail smoothly through this data universe, we must pinpoint these datasources. Thus, the concept emerges: segregating driver installations into a distinct Dockerfile, enabling us to layer our celestial images for maximum efficiency.
Create the Docker File
Using our basic linux commands once again, we can create a separate working directory and the Dockerfile. Remembering that the filename is case-sensitive; an uppercase “D” is your ticket to interstellar success.
mkdir DockerFinal cd DockerFinal touch Dockerfile
Download the Driver files
Begin your cosmic quest at the Tableau Driver Download page. There, you shall unearth all the artifacts and star charts essential for your specific cosmic requirements. Once acquired, transport these enigmatic files into the newly formed DockerFinal realm. For this voyage, I, too, ventured into the realms of both MS SQL Server and Oracle, my chosen constellations in the celestial data landscape.
Edit the Docker File
Using the built-in vi tool, I added my docker commands to my docker file:
# Use the previously built image, bridge_base FROM bridge_base # Oracle – Copy the jar file into the jdbc drivers folder COPY ojdbc11.jar /opt/tableau/tableau_driver/jdbc/ #MS SQL Server – Install Microsoft ODBC for SQL Server version 17 COPY mssql-release.repo /etc/yum.repos.d/ RUN ACCEPT_EULA=Y yum install -y msodbcsql17
Build the Final Image
docker image build -t bridge_final .
Push Your Image to Amazon Elastic Container Registry
Propel your meticulously crafted image into the vast expanse of the Amazon Elastic Container (ECS) Registry, where it shall await its cosmic deployment:
aws ecr create-repository --repository-name my-bridge-repository --region < Your AWS Region > docker tag bridge_final < Your AWS Account >.dkr.ecr.< Your AWS Region >.amazonaws.com/my-bridge-repository aws ecr get-login-password --region < Your AWS Region > | docker login --username AWS --password-stdin < Your AWS Account >.dkr.ecr.< Your AWS Region >.amazonaws.com docker push < Your AWS Account >.dkr.ecr.< Your AWS Region >.amazonaws.com/my-bridge-repository
Building a Bridge Brigade — Building an Autoscaling Cluster
This is where the fun begins. Using the Amazon Elastic Container Service (ECS), we are going to build a cluster for Tableau Bridge. Provide a simple and meaningful name and add your monitoring and organisational tag details:
Embarking on Fargate Adventures — Deploying on AWS Fargate
Remaining in the Amazon ECS console for now, create a new task definition with JSON. I have below a sample task_definition.json that works for this example, but do keep in mind that you’ll have to change certain references and settings to match your environment.
Here, you’ll find another decision point: you can create dynamic agent names as I have below, but you’ll have to manually remove them from the Tableau Cloud Pool once they’ve been terminated as there is no means (as it stands today) to programmatically deregister the client from Tableau Cloud.
The alternative is to have a static agent name, but all clients will appear as a single client within Tableau Cloud, despite being numerous:
{ "family": "bridge-family", "containerDefinitions": [ { "name": "bridge", "image": "< Your AWS Account >.dkr.ecr.<Your AWS Region>.amazonaws.com/my-bridge-repository", "cpu": 8192, "memory": 32768, "portMappings": [ { "name": "bridge-80-tcp", "containerPort": 80, "hostPort": 80, "protocol": "tcp", "appProtocol": "http" } ], "essential": true, "entryPoint": [ "sh", "-c" ], "command": [ "/bin/sh -c \"/opt/tableau/tableau_bridge/bin/TabBridgeClientWorker -e --client='Agent-$(date \"+%Y%m%d-%H%M%S\")' --site='< Your TC URI >' --userEmail='< Your Email >' --patTokenId='< Your TC PAT name >' --patTokenFile='TokenFile.json' --poolId='< Your TC Pool ID >'\"" ], "environment": [ { "name": "LANGUAGE", "value": "en_US.utf8" }, { "name": "LANG", "value": "en_US.utf8" }, { "name": "LC_ALL", "value": "en_US.utf8" } ], "mountPoints": [], "volumesFrom": [], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-create-group": "true", "awslogs-group": "/ecs/bridge-family", "awslogs-region": "ap-southeast-2", "awslogs-stream-prefix": "ecs" } } } ], "taskRoleArn": "arn:aws:iam::< Your AWS Account >:role/< Your IAM Role >", "executionRoleArn": "arn:aws:iam::< Your AWS Account >:role/< Your IAM Role >", "networkMode": "awsvpc", "requiresCompatibilities": [ "FARGATE" ], "cpu": "8192", "memory": "36864", "ephemeralStorage": { "sizeInGiB": 100 }, "runtimePlatform": { "cpuArchitecture": "X86_64", "operatingSystemFamily": "LINUX" } }
Harmonizing the Docker Symphony — Creating a Service
With your brand new Docker images published to Amazon ECR, and your new Amazon ECS Cluster and Task Definitions created, we can now deploy the task definition by creating a service.
Deploying the Task Definitions as a Service provides you with the opportunity to define all of the important infrastructure components specific to your needs. Key aspects such as:
- Which VPC and Network Subnets your service should operate in. Noting that these subnets would likely be private or protected subnets and have direct network connectivity to your data sources
- Your Security Group with the all-important inbound rules defined to protect your data
- Enable and set your Service auto scaling thresholds.
Commanding the Fleet — Monitoring and Scaling
Upon saving and deploying the new Amazon ECS Service, you should see your new Tableau Bridge Clients magically appear in your Named Pool, seen in Tableau Cloud Settings -> Bridge -> Pooling section:
Don’t Panic — Limitations and findings
Agent Name:
By choosing a dynamic Agent Name in your startup command, you empower Tableau Cloud to distinguish between active clients effortlessly. This enables you to easily track the number of agents in operation and identify their status via the Tableau Cloud Settings page.
However, it’s important to note that dynamically generated client names won’t be automatically removed from the Tableau Bridge Pool. This presents you with a choice: maintain visibility on the agent count, along with the administrative task of cleaning up unused agents, or opt for reduced administrative overhead by forgoing visibility through the Tableau Cloud portal.
Token File:
Your Personal Access Token (PAT) is handed over to the Bridge Worker client in real-time through a local JSON file. You’ve got a choice to make here: You can either embed the Token file right into the Docker image, which gets published to the Elastic Container Repository and then used by the cluster, or you can dynamically craft a file within the Elastic Container Service (ECS) Task Definition. It’s all about choosing the path that suits your needs best.
Platform:
The containerized solution finds its support solely in the welcoming embrace of CentOS distributions. So, when it comes to containerization, CentOS is the star of the show.
Raising the Victory Banner — Conclusion
In the ever-evolving space of data integration and analytics, your Tableau Bridge expedition can take on a whole new dimension by embracing containerization and AWS Fargate. These cutting-edge technologies empower you with scalability, cost-efficiency, reliability and efficiency, transforming your Tableau data processes into a force to be reckoned with.
While navigating your way through this serverless space odyssey, it’s crucial to acknowledge that every journey is unique. Your choices and adaptations will be the stars guiding your path, as you configure your Tableau Bridge to suit your specific business needs.
By overcoming the limitations and embracing the possibilities, you can chart a course to a future where your Tableau data integration soars to new heights, delivering real-time insights and empowering your organization to reach for the stars of success. So, embark on your own Tableau Bridge expedition, and let your data journey be an adventure of discovery, innovation and transformation.