Simple Histograms in Tableau

Data

Simple Histograms in Tableau

Histograms are yet another example of an efficient statistical tool that Tableau can create for you with ease.  Created by well-known Statistician Karl Pearson, a histogram also functions as an effective approximation to a variable’s probability distribution function.

A histogram is a frequency plot: it will illustrate the number of times a particular result occurred in a data set and does so in the form of a bar chart.  It is always a good idea to have an overview of the entire distribution of a variable, and histograms are another great way to gain that perspective.

As with many things, Tableau makes creating a histogram an easy process.  I start by examining a distribution by using Tableau’s Gantt Bar visualizations.  I’ll be using the shipping cost variable from the Superstore Sales (Excel) sample data set.  Placing that measure into Rows and then converting from an aggregate measure to dimension will automatically adjust from a bar graph to a Gantt Bar chart detailing every instance of the shipping cost in the data set.

This particular Gantt chart looks like a bar graph that was sheared off in certain places.  This is not unexpected; the chart shows every single cost value for every item shipped in this data source.  Those sections in which the graph is densely packed indicates that there were many occurrences in that range (consider the $0 to $20 range) In contrast, from $100 up to the max of $165, items were shipped at only a select few costs.

This visualization only shows us what values of the variable were used; it does not indicate how many times each cost occurred.  To see the frequency of each cost, employ a histogram.

Creating histograms in Tableau require the use of bins.  Bins, or binned data sets, are groupings of measures of a variable.  In this particular example, we could bin together certain shipping costs, such as those who are sparsely distributed in the Gantt Chart.  It might be useful to create a bin with the values from $100 up to $165.  You can create bins of uniform or varying size; the former is simple, while the latter is somewhat more complex.

 

Bins with Uniform Size

Creating a uniformly sized bin is a simple process. 

-First, right click on the variable you want to group (shipping cost in this case).  Select “Create Bins…” and you should see a window like below:

-The field name will default to your variable’s name followed by (bin).  Adjust to your liking.

-The size of bin defaults to 1; I’m going to look at bin sizes that are $10 each. Again, adjust to your liking.

-Finally, the range of values is blank to start. Clicking load establishes the range of applicable values for your variable.

-Click ok, and you’ll create a binned dimension for your variable.  

 

To create the histogram:

-Take the binned dimension – Shipping Cost(bin) and place it in Columns

-Place your corresponding measure – Shipping Cost (from Measures) in Rows.

-Convert from the default aggregation of SUM to COUNT by right-clicking and selecting CNT from measures.  Doing so results in the histogram.

The histogram shows that this particular group makes most of its shipments at a cost between $0 and $10, with just under 6,000 shipments in that range.  From $10 to $20, just over 1000 shipments occurred.  Clearly, this superstore is not making very many high costs shipments; it is doing a very good job of keeping its shipping costs low.

This is a very simple example of the creation of a histogram.  Other variables will affect how you create your histogram.  You may find that you need to adjust your bin size differently.  There are many algorithms designed to pick bin size depending on the sample size. For example, a common choice for the number of bins is sqrt(n) where n is the number of items in your data set.  For very large datasets, some choose the Sturges’ formula, which calculates the number of bins as 1+(log(n)/log(2)).  Of course, Tableau asks for the size of each bin, which is simply calculated by the difference between the maximum and minimum values divided by the number of bins.  How to calculate the appropriate number of bins for your distribution depends ultimately on what you plan to do with your histogram, and there are many schools of thought on how to do that.  That discussion is better left for a different post, however. 

This is a simple introduction; in another post, we’ll walk through how to create custom size bins. Again, Tableau’s power makes this a straight forward process. 

More About the Author

PYD117 – Los Angeles Downtown Women’s Center In this episode of PYD, InterWorks Content Manager Garrett Sauls and InterWorks Analytics Consultant Sarah Dorfman highlight how the ...
Webinar Replay: Cloud Analytics for the NHS In this webinar replay, Graham Beales, Head of Business Intelligence at Greater Manchester Health and Social Care Partnership, guided ...

See more from this author →

Subscribe to our newsletter

  • I understand that InterWorks will use the data provided for the purpose of communication and the administration my request. InterWorks will never disclose or sell any personal data except where required to do so by law. Finally, I understand that future communications related topics and events may be sent from InterWorks, but I can opt-out at any time.
  • This field is for validation purposes and should be left unchanged.

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072