Statistical Insights Using Tableau: Part Two

Data

Statistical Insights Using Tableau: Part Two

by Michael Treadwell
//

If I were to ask you to name the three most significant factors that predict the murder rate within a geographic area, then I would expect you would name some measure of median income. To test this hypothesis, we could do a simple linear regression using Tableau as I spoke about in Part 1, but that is kind of boring. It can also be inaccurate. The first assumption for linear regression is the independence of observations. If you were to plot Median Income against Murder Rate for all of the U.S. states and calculate a regression line, then the assumption is that each state is an independent observation that is unaffected by neighboring states.

Murder Rate and Median Income regression line

The Independence Assumption

This assumption works well for geographic areas as large as states and in the graph above we see the expected result that Murder Rate decreases as Median Income increases. The model shown is significant with a p-value of .0048. What if we wanted to create the same model on a more local level by zip code? Can we still make the assumption that each area is independent of its neighbor? Realistically, zip codes are arbitrary separations and not concrete geographic boundaries. For that reason, we cannot assume that a relatively wealthy neighborhood isn’t affected by the murder rate from its neighbor and vice-versa.

Murder Rate and Income by Zip Code

Everyone is a fan of The Wire, so I thought we could look at crime data from Baltimore, MD and compare the income and murder rates of their twenty zip codes. As I have stated before, I am a fairly new user of the Tableau software, and I found it amazing that all of the GPS coordinates for U.S. zip codes were included by default. To create the visualization below, simply drag the Zip Code dimension to the Marks shelf and separate color by dragging the Median Income measure to the color icon.

Median Income heat map of Baltimore, MD

The visualization above is a heat map of Baltimore zip codes. A darker color of orange indicates a higher median income. Using the same methods as before, we can create another heat map based on Murder Rate per 100,000 as shown below:

Murder Rate heat map of Baltimore, MD

Spatial Autocorrelation

These two maps are easy to create, and their juxtaposition can tell us a lot about the relationship between income and murder without looking at another xy-plot. As we expected, the lower-income, inner-city neighborhoods generally have a higher murder rate than the higher-income suburbs. However, if we look a little longer, we can see that some high-income zip codes have a much higher than expected rate of murder, such as 21209 and 21214. This could be explained by spatial autocorrelation, a measure of the degree of dependency among observations in a geographic space. In plain English, this means that sometimes we can treat variables like a crime as a disease. Observations are “contagious” and a high-income observation in close proximity with a low income observation can be “infected” by their crime rate. For example, the higher-income 21209 and 21211 may be experiencing an increase in their crime rate due to the I-83 corridor connection to the 21201 zip code. Thanks to Tableau, it is easy to get a map overlay that includes streets and highways.

Data

The data used were found at the following sites:

http://goo.gl/8knZDH

http://goo.gl/eFMWA2

http://goo.gl/m8EgD9

More About the Author

Michael Treadwell

Data Lead
Introducing the Snowflake Data Cloud: Data Science When you think of data science (for the purposes of this blog, this will encompass all machine learning and AI activities), you may ...
The Migratory Patterns of the Common Alteryx Workflow Prior to Alteryx Server version 2018.4, migrating workflows was a three-step process: Deny the problem exists Procrastinate Acquiesce ...

See more from this author →

Subscribe to our newsletter

  • I understand that InterWorks will use the data provided for the purpose of communication and the administration my request. InterWorks will never disclose or sell any personal data except where required to do so by law. Finally, I understand that future communications related topics and events may be sent from InterWorks, but I can opt-out at any time.
  • This field is for validation purposes and should be left unchanged.

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072