Guest Blog Post: What on Earth is Placekey and How Can I Use it in Tableau?

 


Ken and I are incredibly excited to have Sarah Battersby and Paul Rossman join us today as guest contributors to FlerlageTwins.com. They are both brilliant and a ton of fun!

 

Sarah has been a member of Tableau Research since 2014.  Sarah’s primary area of focus is cartography, with an emphasis on cognition. Her work emphasizes how to help everyone visualize and use spatial information more effectively – no advanced degree in geospatial required.  Sarah holds a PhD in GIScience from the University of California at Santa Barbara. She is a member of the International Cartographic Association Commission on Map Projections, and is a Past President of the Cartography and Geographic Information Society (CaGIS).   She can identify Kevin and Ken correctly 50% of the time. Sarah can be contacted at sbattersby@tableau.com or on Twitter @mapsOverlord.

 

Paul has been in the analytics/data science space since 2011 – or when machine learning was called a regression model. He has been using Tableau since 2015. He dabbles in a lot of different areas – from baseball to geospatial to tracking his own weight loss. Paul has a master’s degree in applied mathematics from Indiana University of Pennsylvania and an undergrad in journalism – but sometimes forgets how to piece a sentence together (Paul’s words – not mine).  Paul knew Ken before it was cool to know Kevin.  Paul can be contacted at prossman@gmail.com or on Twitter @p7_stats.

 

(See, I told you they were a lot of fun!)

 


Thanks for having us Kevin & Ken.  As mentioned above, my name is Sarah Battersby and I LOVE mapping.  Today, Paul Rossman and I will be sharing a bit about a new open initiative from SafeGraph called Placekey and we will talk about how to utilize it within Tableau. 


The intention of Placekey is to provide a unique, standard identifier for physical places (a "where") AS WELL AS a "what" component to track the details for known points-of-interest (POIs). The Placekey website describes it as follows:


“Placekey is a free, universal standard identifier for any physical place, so that the data pertaining to those places can be shared across organizations easily.

 

However, Placekey goes beyond just an identifier. It’s a movement of organizations and individuals that prize access to data. Placekey members want geospatial data that is easily joined and combined...because real answers come from combining data from many different sources. It is a philosophy that data should be easy to access, and data should not be hoarded. These members believe that data, when combined, can do massive good.”


The "Why Placekey" white paper does a great job at documenting the standard and structure of Placekey coordinates, so we won't go into that here.  We will just focus on the fun parts of how to use Python to interact with Placekey and how to turn the Placekey information into data that you can work with in Tableau!


If you are a data scientist & Tableau user like Paul, you are probably often joining physical places from multiple datasets. This could be address data from places like POI Factory or ad-hoc location scrapers built to scrape business locations (like this one for Starbucks).   Joining location data from addresses can be a mess!


It’s great when the physical address matches, but when it doesn’t, which happens often in rural areas, you run into some challenges. Using Placekey we can get around some of these problems.  Placekey provides us with one unique identifier for each business or point of interest that can be used across datasets. This allows us to de-dupe and clean our POI data in a seamless way.



Image source: SafeGraphy "Why Placekey" white paper


 In this post we’ll dig into the basics of working with Placekey and a few different ways to use Placekey as part of your Tableau data preparation process.


First, I will dig into the basics of working with Placekey and various ways to interact with the data to give you multiple tools in your data prep toolbox.


From there, Paul will demonstrate a real business use case and will walk through what this looks like in a Tableau Prep workflow.


So, let’s get going!


A bit of background


To take serious advantage of Placekey and the Python libraries that we’re using to work with the data, a good place to start is some background reading.  Here are some of our favorites for getting up to speed… but if you want to just skip over these for now and come back to them later we totally understand.


Placekey API documents - What is the Placekey API? What can you do with it?


Places manual - General documentation on the SafeGraph Places dataset, with a section on Placekey


Placekey on PyPI


About Uber's H3 - Uber's Hexagonal Hierarchical Spatial Index


H3 Python library - H3 Python library GitHub Repo


Helpful H3 Jupyter Notebooks


Tableau TabPy (Background and basics, GitHub Repo)


 

The basics of working with Placekey


There are a few ways to access and work with Placekey codes.  I have put together a Jupyter Notebook with a step-by-step walkthrough that demonstrates using the Placekey Python library and accessing Placekey using the API URL.  


We aren’t going to walk through the whole notebook – since you can just read it on GitHub or copy it and start working with it on your own, but we’ll highlight the basics.


1.       You’ll need a Placekey developer account.  This is free.  Placekey is free.  All you have to do is register on the Placekey Developer Portal.  Once you have an account, you will have an API credential key.  You’ll need this for calling out to Placekey.  Once you have your API key you can get serious about tapping into Placekey!


 

 

2.       You need location data.  Placekey can include various types of location information, including latitude/longitude pairs, addresses, and POI (Point of Interest) names.  The Jupyter notebook has several different examples of valid input data that you can check out – like this one for the Tableau HQ where we have the street address, city, state, etc.

 

    "address": ''' {

        "query" : {

            "street_address": "1621 N. 34th Street",

            "city": "Seattle",

            "region": "WA",

            "postal_code": "98103",

            "iso_country_code": "US"

        }

    }'''

 

3.       You need to ask the API nicely to return the Placekey.  You can do that in a few ways… You can type the address in manually on the Placekey web page:


 

 

Or you can use the API and some Python code to run through a bunch of locations at a time. The Jupyter notebook walks through all of the steps to checking a series of addresses to return the Placekey for each – here is the basic query using Requests and the resulting JSON.  For each location, we have a Placekey in the WHAT@WHERE format.   For these examples, we only used a query name for three of the queries…just in case you wondered why the query_id was 0 on the first two.  We just didn’t name them.  You don’t have to use a name…

 



 

4.       Generate some geometry from the result.  Once we have the Placekey WHAT and WHERE components, we have everything we need to enrich our dataset and drop the data into a map to use in Tableau ( or anywhere else…but we’ll set up the data in .hyper files for use in Tableau, because we love Tableau).

 

Each Placekey WHERE code (e.g., @5x4-4b3-wc5) may look like a bunch of nonsense, but it represents a specific location on the earth.  The code ties directly to a single, small-ish Uber H3 hexagon.


 




Using the Uber H3 library we can find all sorts of details for that location and drop them into a file to use in Tableau.

 

We can find the centroid of the hexagon using the Python Placekey library.  This lets you map addresses to approximate point locations

 

pk_centroid = Point(pk.placekey_to_geo(“@5x4-4b3-wc5”)

 

Or we can get the entire polygon geometry.  This gives you nice hexagon geometry so that you can aggregate your dataset into polygonal regions (e.g., sum of sales for all locations within this hexagonal region).

 

pk_polygon = pk.placekey_to_polygon(placekey_where_code, geo_json=True)

 

Here is the result for a few different Placekeys:




5.       Write the data to a .hyper file to drop directly into Tableau.  It’s nice to use Python to generate data, but that doesn’t go straight into Tableau on its own.  But, it’s easy to use the Tableau Hyper API to quickly write the results into a .hyper file.  This is all done step-by-step in the Jupyter notebook, so we won’t detail it here.

 

It’s pretty simple – you just create a hyper file and add two data tables – one for the original address data, and one for the geometry.  Since multiple locations can fall in the same location, it makes sense to have a separate table for the geometry so you don’t have to duplicate a bunch of polygons and bloat up the size of your .hyper.  Instead, you can just join the data and geometry tables together in Tableau instead.

 



 

And then you can go crazy with your analytics in Tableau:




A Tableau Prep Workflow


While it may have been informative to use a Jupyter notebook walk through to learn about Placekey, it is even better to have a real-world example from a real, live data scientist.  Paul Rossman will walk us through a TabPy script to insert directly into your data prep workflow in Tableau.  Take it away, Paul.


Thank you, Sarah.  To set the stage, imagine you are given 2 datasets with two different sets of Fast Food locations and you need to figure out which ones match and which ones are in one dataset but not the other. You look at the data. Some of the physical addresses match, but some don’t. Some have some pretty gnarly addresses. Some of them have Rural Route addresses, others are intersections. Where do you start?


If you are like me, in the past, I would concatenate the address fields and try to do string matches. You could run it through an address standardization API, but that only gets you so far.


Here’s where Placekey comes into “play”! With Placekey, you can pass an address (cleansed or uncleansed) and it will find the physical address (magically) and give you back the unique identifier for each location. You can then join on this identifier, and now matching locations becomes a breeze!


Okay, so let’s do it! Using a python script (availableon GitHub) and Tabpy Server, we can easily clean our location data. You can download the script to use.  To explain how it works, we’ll walk through key parts of the script here.


We’ll start by jumping to the third function called placekey_lookup.  To help make sense of this, assume that we have a dataset that has the following fields/structure:




  

Working with this data, our first step is to rename the columns so the placekey API can read them.




  

Next we clean out the data so we eliminate Nulls or blanks, and convert the data frame to json.




 

Next, we break the file into 50 record chunks, using the helper function in the script called prepare_batches_for_API.  The function to prepare the batches just sets up the properly formatted query to the Placekey API.   Finally, we pass those chunks to the placekey API, and parse the returning json.



 

At last, we have our original data and the matching placekey for each of the locations it was able to match.




Placekey has a whole bunch of other functions, some of which you may have seen in the Jupyter Notebook that Sarah put together.   When cleaning up POI locations, it’s the most robust solution for matching current businesses and past businesses that are / were located in the same place.

 

Well, that’s it.  We hope that our introduction helps you get started working with Placekey in your Tableau workflow!

 

Thanks for reading!

Sarah & Paul

 

 

Kevin Flerlage, November 30, 2020

Twitter | LinkedIn | Tableau Public

 


 


 


 


No comments:

Powered by Blogger.