QGIS for Tableau Users # 1: Getting Started

 

Kevin and I are incredibly excited to, once again, have Sarah Battersby join us for a guest blog. This post is the first in a multi-part series about QGIS, a free, open source Geographic Information System (GIS) that you can use in conjunction with Tableau. In my introduction for her last guest post, The Power of Place: Unleashing Census Data in Your Tableau Analytics, I joked that we were going to have to make her an honorary Flerlage Twin. With this series, I think she’s definitely earned her place as an honorary Flerlage Twin (or Triplet??).

 

Sarah has been a member of Tableau Research since 2014. Her primary area of focus is cartography, with an emphasis on cognition. Her work is focused on helping everyone to visualize and use spatial information more effectively—without the need for an advanced degree in geospatial. Sarah holds a PhD in GIScience from the University of California at Santa Barbara. She is a member of the International Cartographic Association Commission on Map Projections, and is a past President of the Cartography and Geographic Information Society (CaGIS). Sarah can be contacted at sbattersby@tableau.com or on Twitter @mapsOverlord.

 

Tableau is great for maps and spatial analysis, but it doesn’t do everything. Sometimes you need a spatial helper for your data. When I need that helper for my work, I generally reach to QGIS, a free and open source GIS. In this series of blog posts, we’re going to explore how QGIS can be used as part of your Tableau workflow. This first post will cover basics of QGIS—from downloading to the basics of working with data in the software: opening files, updating properties, exporting spatial and text files, etc. The next posts in the series will each tackle different use cases for QGIS and where it fits into your Tableau workflow.

 

What is QGIS?

QGIS is a free and open source geographic information system (GIS). It runs on Windows, Mac, Linux, BSD, and mobile/tablet! You can do amazing and complex things with your spatial data using QGIS—and, more importantly, you can do some really simple, but valuable, things quickly to enhance your Tableau spatial workflow. In this series, I will walk through some of the basics that I use frequently in my own work and for answering questions that come up regularly in the Tableau Community Forums (perhaps, it’ll even keep Ken from constantly tagging me on spatial questions ;).

 

This post will be a bit longer than the others in the series because it’s all of the ‘getting started’ stuff. I’m assuming you’re starting from never having used QGIS before so this is intended to be a reference that you can return to in order to remember the basics. If you already have it installed and know the basics of working with data, you can just skip to the more targeted how-to posts as they are posted over the next few months.

 

How Can QGIS Help You with Tableau?

Tableau supports a ton of spatial analyses—with more being added in each new release. But, it isn’t a geographic information system (GIS) and there are specialized functions that GIS support that are super useful when working with spatial data...but that we may not see in the Tableau product soon. Most of the work that I do with QGIS is for data preparation—the types of things that you do once to get your data in the right shape...and then you’re good to go with your Tableau analysis. Those are the types of functionality that I’m going to highlight here. Since this is just the start of the series, here is a highlight of topics I plan to hit later on (or relevant topics that I’ve covered in other posts elsewhere...)

 

Conversions / Making New Data

 

•  Going between projections / coordinate systems—hey, it’s in this first post! Look for the bit on exporting your data!

•  Well known text to geometry & other manipulations of text into spatial!

•  Converting lists of points into lines, polygons

•  Making spatial bins - for instance, if you want to show your point data aggregated to square or hexagonal shaped bins

•  Generating special polygon fills like stripes!

 

 

Analyses

 

•  Voronoi or Thiessen polygons to find all of the locations closest to individual points

•  Distance matrices

•  Spatial joins

•  Finding all adjacent locations

 

When do You NOT Need QGIS?

Before we get started, here is a short list of things you do not need QGIS for if your goal is to work with spatial data in Tableau:

 

•  Spatial Intersection (point polygon) - Starting in Tableau 2018.2, the spatial intersection join type was added to the product. If you have points and polygons you can do all of your joins in product. So if you just want to know what region every customer is in, just add your region polygons and your customer points and join them together!

 

•  Buffers - Want to know what is within a specific distance of a location? Starting in 2020.1, you can create buffer geometries around points and measure distances between locations with ease. No more special calculations or pre-processing.

 

•  Spatial Union - If you have multiple spatial files and need to combine them together, no need to use QGIS. Starting in 2020.3, you can union them just like any other file type.

 

The Basics

Download & Install

The first critical step in starting with QGIS to support your Tableau analytics is to install it! You can get the installation files on the QGIS Project website. There should be a big, green ‘Download Now’ button that you can click (but if not, here is the shortcut to the download page). Scroll down to the version you want. I use Windows, so that’s where I look, but if you’re using macOS, Linux, etc., just look for the installers for your system. Then simply run the installer. Be patientit can take a while to install.

 

In this rest of this post, we’ll walk through the basics, including:

 

•  Adding data

•  Adding background maps

•  Calculating attributes

•  Exporting data

 

Add Data

Great, now you have QGIS installed. Open QGIS Desktop and we’ll get started! For reference, I am using QGIS 3.14 (not the latest version, but it should look pretty much the same for the examples in this post).

 

How can you get some data into QGIS? I’ll work with two common data source types so you can see the basics of how to work with thema text file with point locations and a spatial file (e.g., shapefile, geojson, etc.). Here are links to some basic files you can download if you want to try out working with data in QGIS now:

 

CSV Files 

 

In this post, I’ll work with a dataset of Boston public schools. This particular dataset has the x and y coordinates in a projected coordinate system which makes it a little trickier (AKA, more fun!) to work with. The dataset is in Massachusetts Mainland (ftUS), which has a spatial reference id (SRID) of 2249—we’ll look at how to use that bit of information when working with projected data in QGIS.

 

Another fun dataset, if you want to play around, is the NYC squirrel census. I’m not using this as an example in this post, but it’s still a lot of fun to look at! The table of data that you can download from NYC for the squirrel census is in latitude and longitude.

 

Spatial Files

 

I love the US Census cartographic boundary files. Tons to choose from!  In this post, I’ll be using the Census Tracts for Washington state shapefile from the Census.

 

When you first open QGIS, you will likely have a blank screen that looks something like this - we’ll open a new empty project to get started:

 

 

Now we have somewhere to add our data. The easy place to find the tools to add any new dataset is to use the Layer Add Layer menu. For my Tableau-related work I generally just use these two options: Add Vector Layer... and Add Delimited Text Layer...

 

 

Add Spatial Files

I’ll start with vector (point, line, and polygon) spatial files. These are generally the easiest and most straightforward. Most of the time, these files come with a coordinate system defined for them, and this information is stored in the file itself, which makes working with it in QGIS much easier. 

 

Generally, you can just use Layer Add Layer Add Vector Layer and select your spatial file in the Source section.

 

 

Most of the time, that will be all you need to do to add a spatial file.

 

Occasionally, you’ll see a window asking you to select a Transformation for your file—this shows up when your dataset is in a different coordinate system than the QGIS map and it needs some help in converting between the two coordinate reference systems. In most cases, taking the default option will probably be just fine. 

 

 

Add Delimited Text Files

It’s easy to add a text file - just use Layer  Add Layer Add Delimited Text Layer like this:

 

 

I’ve highlighted the important parts in the image above:

 

•  File Name - Where QGIS will find your file. Click on the ... next to the text field for the file picker dialog.

 

•  Layer Name - What QGIS will call your layer (I’m generally lazy and leave the default).

 

•  File Format - Is it a generic CSV, or does it use a custom delimiter, or a regular expression delimiter?

 

•  Geometry Definition - This is the really important part, and it isn’t as intuitive as the other key parts, so I’ll go into some additional detail right now...

 

Ah, the geometry definition. This is the critical piece that makes your CSV useful as a spatial file. In Tableau, you just drop your latitude and longitude onto the worksheet and it all works (assuming you have set them to have the right geographic role), or you use MAKEPOINT() to convert them to a point geometry. In QGIS you need to tell the software a bit about the data. It doesn’t like to make too many assumptions. 

 

Tip on Tableau’s MAKEPOINT() Function: If you’re just looking to convert a CSV with points into something Tableau can use, you do not need to use QGIS, just use MAKEPOINT in Tableau! Even if your data isn’t in latitude and longitude, you can convert it using the spatial reference ID (SRID) in Tableau using MAKEPOINT(x , y, srid). For more information about SRID check out the Spatial Reference System wikipedia page. You can look up SRID codes using projection names on the EPSG.io web site or spatialreference.org

 

If you know you have projected coordinates (not latitude/longitude), but don’t know the SRID here are my methods for figuring out the coordinate system: 1) check for any metadata that came with the file, or from the site where you downloaded the data.  They often list the coordinate system. 2) If #1 doesn’t work, do a search for “official coordinate system {insert agency / county / state / country}” where the data is located.  3) Guess randomly until you get it (don’t really do this...you should be able to find the details if you search around, but sometimes I really do do some random guessing...but I’ve also been dealing with these problems for a lot of years so have some intuition about the guessing process).  

 

If you have two fields with coordinates, and fields are nicely named something like “latitude” and “longitude” they may already be filled into the right fields. If they aren’t, you’ll want to hit the dropdown for X field and Y field and fill in the right field names. Remember that X matches to Longitude and Y matches to latitude if your coordinates are in latitude and longitude. If you have projected coordinates, just match up the X coordinate to the X field, etc.

 

If you have Well Known Text (WKT) in your table you can use the WKT option to define the geometry. If you download data from an open geospatial data portal you very well may end up with a table with a point, line, or polygon geometry as WKT. If you have Point data it’ll look something like this: POINT ((-73.97 40.63)). If you have polygon data, it’ll look a little more complicated, like this example of neighborhood polygons that I downloaded from the NY City Open Data Portal

 

 

You also need to tell QGIS about the Geometry coordinate reference system (CRS) for the file. This is how QGIS knows where in the world to put your data. The default will probably be EPSG: 4326 - WGS84. This is a common way of defining latitude and longitude based on the World Geodetic Survey of 1984. If you have latitude and longitude data and you don’t know the specifics of your CRS, this is probably a reasonable guess. If you do not have latitude and longitude for your data— for instance the X and Y coordinates are in a projected coordinate system—you will need to find the right CRS to define your points. How do you do that? My first place to look is always to check the source where I downloaded the data—there may be a metadata guide that tells you what they are using. If it’s from a city, county, or state, there is probably an official coordinate system that is used and you can search the government agency for details about the official coordinate system. I can’t help you much here other than to tell you that I just start using web searches to see if I can find some pointers on what the standard system is for the location of my data. 

 

If you use the Boston public schools (x/y in projected coordinates SRID:2249) example file, you’ll define the coordinate system like this...

 

Click on the little globe-looking button to the right of the Geometry CRS dropdown:

 

 

In the selector window that opens, type in the information that you have in the filter at the top - If you know the SRID type that in.  If you know part of the name of the coordinate system, type that in.  The list will filter.  If you’ve used the coordinate system recently it may be in the top box under “Recently Used Coordinate Reference Systems.”  If not, take a look a little lower in the window to find the “Predefined Coordinate Reference Systems” box.

 

Note that when you pick your coordinate system there is a nice little locator box on the bottom of the CRS Selector window that gives you an idea of the location typically covered by the coordinate system.

 

 

Click OK and you should see the coordinate system listed in your import layer dialog.  Boom...CRS defined.

 

 

Viewing with Context (Base Map)

When you add files into QGIS they will draw on the map canvas—but, unlike in Tableau, there won’t be a nice built-in base map for you to use to make sure all your data is in the right place. You can turn a base map on easily, though, just use Web  Quick Map Services. I like the OSM Standard maps. Important Note: If you don’t see this option in your main menu, you will need to install the free Plugin (use the menu for Plugins Manage and install plugins and then search for QuickMapServices). 

 

 

Now you should see your data on a nice basemap. Hopefully everything seems to be in the right place! If not, there is probably an issue with the Coordinate Reference System that you defined and you’ll need to fix that before moving on. See the notes in the section above about defining the Coordinate Reference System for your dataset.

 

 

Calculating New Attributes

While you can do a lot with calculated fields in Tableau, there are some spatial calculations that you can’t easily do which can be easily done in QGIS. I’ll demonstrate a single calculation to give a general overview, but we’ll do lots of calculations in subsequent posts so you can see more examples then. Adding attributes will require us to work with the “attribute table”, which is displays information on features of a selected layer. For more information about this, read through the QGIS documentation file on Working with the attribute table.

 

First of all, how do you open the attribute table to take a look? Just right click on the layer that you’re interested in and select ‘Open Attribute Table.’

 

 

Next, open the Field Calculator

 

 

From here, we can calculate some new values!

 

For some file types (e.g., CSV) you’ll create ‘virtual fields’ and will then have to save your file with a new name to have them made permanent. For editable file types, you can start editing the table, then add your calculations, and just save the edits at the end and have them permanently saved in the table.

 

For any calculation you’ll define a few key things for your field:

 

 

1)  Output field name

2)  Output field type (e.g., will it be an integer or decimal number

3)  Output field length (and, if relevant, precision) - to define the field size and number of decimal places. This is particularly important when calculating values when you want to ensure a level of precision in the result. You can’t change this after the fact...

4)  Expression - the calculation itself. There is a great set of tools to help you with writing your calculation—just look to the right of the Expression and you can search for functions and see the help associated with each (much like Tableau).

 


Here is an example of a simple calculation—I’ll add the word ‘School’ to the end of each of the “SCH_NAME” attributes (#1 below). Using the lookup box on the right, I can drill into the Fields and Values section to find the attribute I want to work with (#2). I can ten double-click to add it to my Expression. Finally, I just add + ‘ School’ to add the word to the end. Below the expression, the editor will show a preview so that I can make sure I’ve entered the expression correctly (#3).

 

 

In future posts, we’ll do more complex calculations involving geometry (the sorts of things you can’t do as easily in Tableau).

 

Exporting Data

We’re not going to do any big file trickery or manipulation in this post, but often, such file trickery involves exporting a new version of the original file. This comes in really handy when you’re working with spatial data that is in a format that Tableau doesn’t (yet) recognize, like Well Known Text (WKT). We can open the file in QGIS and then export it into a file type that Tableau does recognize.

 

Whether you’re working with points, lines, and polygons from a vector spatial file or created from a text file (e.g., CSV), it’s the same process to export data. All you have to do is right click on the file name in the Layers pane and Export Save Feature As...

 

 

There are a ton of options for the format to export your data. I generally use ESRI Shapefile (because it’s been burned in my brain as the spatial file type after too many years in school and teaching GIS classes with ESRI software). There are many options on this list that Tableau will like such as KML or GeoJSON (the full list of acceptable file formats can be found here).

 

If you don’t need the spatial components of the file (i.e. you just want a CSV with attributes that you’ve calculated), you can just export as a Comma Separated Value (CSV) file. 

 

 

When exporting spatial files, there are a few particularly handy options to know about:

 

 

1)  Format - Choose your format, as discussed above.

 

2)  File Name - Enter your file name (and and remember where you saved it).

 

3)  CRS - The coordinate reference system. This will default to something smart based on the CRS of your original file or the map in QGIS...but it’s better to double check this to make sure it’s right!  This is also the place where you can redefine the coordinate system. So, if your data is in WGS84 (latitude and longitude using the World Geodetic Survey of 1984) and you want it to be in Massachusetts Mainland (ftUS) you can change it here! Just change the drop down before you export.

 

4)  Field List - Choose which fields you want to export. 

 

After exporting your file, QGIS will automatically load it into a new layer that you can continue to work with. Or, you can jump to Tableau and start analyzing your spatial data there.

 

Coming Soon!

Okay, so that’s probably enough to get you started. But we’ve only scratched the surface. In the next post, we’ll go into some manipulations you can do with text files in the next post. 

 

If you want some practice while you wait for the next post, here are some great QGIS tutorials (not Tableau-specific):

 

•  Official QGIS training materials

•  QGIS Tutorials and Tips

 

In the meantime, if great questions or ideas come to you, feel free to reach out on the Tableau Community Forums or to follow more of the random Tableau spatial thoughts that I share on Twitter (@mapsOverlord)...or to share the great maps that you’re making in Tableau!

 

Sarah Battersby, May 3, 2021

No comments:

Powered by Blogger.