PDF Converter in Tableau 10.3


PDF Converter in Tableau 10.3

It seems like yesterday Tableau 10.2 came out with a basket of new features which are proving really useful, shapefiles being my favourite. See it in action here. Early this week however, Tableau announced the release of Tableau 10.3 beta, and as always, I wanted to see what new things await us. One of the features that got one of the biggest responses in Austin last year was the PDF converter, and it’s now time to fully test it out.

I downloaded 10.3 beta from this website after signing up to take part in the beta program, the first feature I wanted to test was how well would it cope with a large PDF table. I have this PDF with two pages of data for 2016 property taxes which would take me forever to type in and I’d probably make a ton of mistakes. Even copying and pasting in Excel doesn’t seem to work that well because of the format that it ports across. So, this seemed like the perfect use case to road test the new Tableau PDF Converter in 10.3.

Here’s a sample of the PDF table below. You can see some merged cells and titles spilling over into three rows that I thought would be a nuisance to make sense of:

PDF table

Alas that was not the case. Follow along below. The first thing I did was to connect to the PDF using the new connector:

Tableau > Connect > PDF File” src=”/sites/default/files/blog-content/TableauPDF2.png” style=”border-width: 0px; border-style: solid;” /></strong></p>
<p>The next screen allows us to define a specific page to be scanned or a range for instance from eight to 10 only. In my case, I wanted all of my pages to be scanned:</p>
<p align=Connect to a PDF

As I connected, I had a look at the data structure, but it didn’t look too auspicious. I had quite a few NULLS and my headers looked all over the place:

PDF table in Tableau

AH! But as most of you will know, Tableau has a great Data Interpreter that can understand headers pretty well, so I switched that on and voila …

PDF w/ Data Interpreter in Tableau

As if by magic, my headers are now all tidied up and I can’t see any NULLS either. But there is still data on Page 2 that I need to bring in. Tableau split the data per pages in the same way it would split tabs in Excel. Time to use another Tableau 10 feature: UNIONS. I drag my Table 2 underneath Page 1 in the connection area and UNION the two tables together.

Union tables

My data is now imported from a PDF and looking good. I have checked all headers and the majority are correct with exception to the merged cells for Reduction Factor and Effective Rate. However, I can easily rename those and off we go.

I imagine the work that the Tableau devs had to go through to make it this simple. It’s an incredibly useful feature to be used with open data or old documents that we thought we could never get the data from. With a few simple clicks, Tableau is able to extract the data and make it ready for analysis.

Thank you for reading, I’m off to check out what else is available in this new 10.3 beta version.

More About the Author

David Pires

Senior Analytics Consultant
Tableau Quick Tip: Top N and Bottom N Filters For anyone attending a Tableau training on the second day, we will often teach how to create a Top N filter. This is a very commonly ...
Tableau Releases Hyper in Version 10.5 TL;DR: Hyper is much faster at extracting data. Like, so fast. Not many Tableau releases have had the size and expectations surrounding ...

See more from this author →

Subscribe to our newsletter

  • I understand that InterWorks will use the data provided for the purpose of communication and the administration my request. InterWorks will never disclose or sell any personal data except where required to do so by law. Finally, I understand that future communications related topics and events may be sent from InterWorks, but I can opt-out at any time.
  • This field is for validation purposes and should be left unchanged.

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK