Geocoding addresses in a scalable and sustainable way has always been surprisingly difficult in the BI world, let alone in Tableau. You either need expensive datasets/services, overkill GIS tools, or you must rely on very legally dubious usage of seemingly free (but not really free) geocoding services like GoogleMaps.
BI tools like Tableau have always supported location data. However, there are times when the data you need isn’t supported by your tool. Case in point: addresses. Converting address data into coordinates—a process called geocoding—is a process that has traditionally been a pain point for most analysts. They either need specialized GIS tools, custom code or have to manually batch data through questionable services.
Luckily, with the addition of Python integration in Tableau Prep Builder, we have access to a whole world of APIs to enrich your data right in the Tableau ecosystem. And this includes geocoding your addresses. That’s all well and good, but there’s still code involved in this process, right?
Wrong! You don’t have to write a single line of Python because our friends at Mapbox did all the hard work for you. Mapbox has built a tool for Tableau Prep that makes geocoding easy within the Tableau Prep ecosystem.
But, before we talk about the solution, a little background!
(Can’t wait? Skip down below to go directly the Mapbox geocoding tool)
What Is Geocoding?
Geocoding is taking the name of some place on the planet and finding the set of x,y map coordinates to represent that place on a map. For example, you have an intuitive sense of where Lawrence, Kansas, is on a map, but by geocoding, you discover that its coordinates are [-95.25553, 38.96278], and I can put that point directly on a map:
Address geocoding is the same process, but it’s focused on an address (like 1935 SE Hawthorne St, Portland, Oregon). When you type an address into Google and it shows the location on a map, you just used Google’s geocoding service:
What’s the Issue?
Finding the location of a single address is easy, but what if you have a list of 1,900 customer addresses and you need to visualize them on a map in Tableau? Will you individually Google each of the 1,900 addresses by hand? No … you won’t.
What if those 1,900 addresses are constantly changing and updating as you gain new customers? You need a production-ready way to geocode addresses, ideally within an ecosystem of BI tools with which you are already familiar. Who wants ANOTHER tool in their workflow?
Surprisingly, until recently, Tableau users, and generally BI analysts, have lacked accessible address geocoding options to use in a production workflow at their business without paying tens of thousands for a geocoding dataset OR traversing into a legal grey area of inappropriately using Google’s APIs and keeping the results locally (i.e. you are probably not really authorized for this).
So, what is the solution? How does one scalably and responsibly geocode data? Our friends at Mapbox have been thinking about this problem for a while and recently made their permanent geocoding API available to all users. This creates the possibility for anyone to geocode at scale.
Who Is Mapbox?
We are big fans of Mapbox over here at InterWorks. If you are a Tableau user, you interact with Mapbox every day. Have you ever noticed that some of the most beautiful maps on Tableau Public are built with custom Mapbox background maps? Don’t believe me? Here are a couple of recent favorite visuals of mine: SF Evictions by LM-7 and Migrant Deaths in the Mediterranean by Naledi Hollbruegge.
Mapbox has built a wide suite of tools, allowing analysts and developers to perform interesting geospatial analysis and/or visualize that geographic data in a compelling way. They are doing great things in the BI world as well. For example, check out Mapbox’s extensions for Tableau or their suite of Alteryx modules. They are quickly adding new spatial tools to their arsenal, and the latest addition is Mapbox’s Tableau Geocoding solution.
Mapbox Geocoding Solution
Mapbox’s geocoding tool takes advantage of Tableau Prep’s new Python integration, allowing Tableau Prep to chat with Mapbox’s geocoder API, geocoding any number of addresses all within a Prep workflow:
As far as installation, Chris Toomey from Mapbox has provided great documentation here on Github and with this video.
What I DO want to say is to let neither the Python nor the installation scare you away. Even though this tool requires Python integration, Mapbox has made it simple to install and use their tool. One could argue that Mapbox made it even easier to install TabPy (a necessary prerequisite to using Python in Prep) than Tableau made it to install TabPy.
Inside Tableau Prep, the tool integrates right into the workflow as a Script step. Below, I am using Prep to bring in a list of addresses from a CSV, prepping the address fields slightly (i.e. renaming my address field to Address, so it’s recognized by the Geocoder tool), then sending those records into a Script step that’s connected to the TabPy instance I set up following Mapbox’s documentation. Geocoded records are returned from the script and then exported to a .hyper for visualization in Tableau Desktop:
Reasons to Use Mapbox for Geocoding
Here are some reasons I would consider the Mapbox approach over others:
- You are fully licensed to geocode and save those geocodes locally. With many other geocoding APIs, you are actually not licensed to cache or use the geocode outside of the current browser-state (i.e. no permanent geocoding).
- You are inside the Tableau ecosystem. You can run the geocoding all within Prep, spit out an extract, and boom—make your viz.
- Security concerns? Many of our clients have tight security restrictions around passing data out of their local network. No worries. Mapbox even has a local installation of their data.
- Global data. Many other tools restrict geocoding (i.e. geocoder) to a single continent or country. Mapbox maintains pretty extensive global coverage of addresses and other geographic data.
- Need to scale up? Mapbox made it super simple to deploy TabPy and their geocoder on AWS using Pulumi.
- Extensible via Prep or Python. With this tool, you have the capabilities of both Prep and Python at your disposal. So, extend this tool to your heart’s content.
- Do other awesome stuff with Mapbox. Mapbox goes well beyond address geocoding, meaning you can also make beautiful base maps for your Tableau dashboards, access their Tableau extensions for doing spatial analysis and create drive-time isolines.
- Need support? Mapbox is one of those companies where you can actually quickly contact a real human if you need help.
Some Tips and Cautions for Using Mapbox
Enable permanent geocoding on your Mapbox account by contacting Mapbox. At the current time, you will need to email Mapbox to make this happen. Send solutions_architecture@mapbox.com a message, and tell them you want permanent geocoding enabled in your account. They will work with you to enable the feature.
Learn from my mistake and do this step; otherwise, you will spend an embarrassing amount of time figuring out why your API token appears broken.
Limit unnecessary API calls. This is a must. Tableau Prep has the interesting behavior of running the workflow every time you do basically anything. Since Mapbox’s geocoder talks to their API and does incur costs, you’ll want to limit as much activity as possible. Here are two methods to help you:
- In 2020.1, Prep allows you to pause or disable Prep’s autorun feature. This is your best option.
- In Prep, use the Data Sample Input step to limit the number of records while are you developing the workflow:
Calculate cost using their calculator. Mapbox provides a calculator for you to estimate costs. In my opinion, Mapbox is about as cheap as it gets for quality geocodes, though you will still want to do your due diligence here. For example, are you geocoding 200K or more addresses a month and also have an Alteryx deployment? Perhaps you might try their spatial data package.
Make sure TabPy is actually running. This applies to any tool that uses Python integration in Tableau, but it is a common mistake I make when opening up a Prep workflow with some Python code in it. With this geocoder in particular, you are running a Docker container with TabPy with the commands below in Terminal or Powershell (see Mapbox’s docs for more info):
What’s the Future?
With Tableau adding Python integration within Tableau Prep Builder, they have opened a whole new world of possibilities in prepping and even analyzing your data. I have a feeling we will see many more useful geospatial tools like this from Mapbox in the future, so keep them on your radar.