Tableau Data Sources Part 1: Data Source Types

 

There is so much you can do with a Tableau data source. You can extract it or maintain a live connection. You can publish it to Server/Online so that it can be reused by yourself and others. You can save it with a packaged workbook so you can share that workbook with others. And so much more!

 

But I’ve found that all of these options can often lead to confusion. And this confusion is not just for new users—even very experienced Tableau users often run into issues when dealing with all the different forms that data sources can take. Personally, this was one of the most confusing things I’ve dealt with. Luckily, largely through my interactions with people on the Tableau Community Forums, I *think* that I’ve gained a pretty good understanding of the ins and outs of Tableau data sources, so in this two-part series, I’m going to share some of what I know about them.

 

The first blog will deal with file-based and server-based connections and the differences between live and extracted data sources, while the second will get into detail about published data sources and the variety of different data-source related files that Tableau works with.

 

File-Based vs Server-Based

There are two primary types of data connections—file-based and server-based.

 

File-Based

File-based connections are ones that involve some sort of file. On the Connect pane of the main Tableau screen, these are listed under “To a File”

 


 

While Tableau lists Excel, text, JSON, Access, PDF, Spatial file, and Statistical file, you can use the More option to connect to a variety of other types of files, such as other Tableau workbooks or Tableau data source files (more on these in part 2).

 

Server-Based

Server-based connections do not connect to a file—instead, they connect to a “server.” Generally speaking, these servers run some sort of application software that allows you fetch data. The most straightforward example are database management systems (DBMS) such as SQL Server, Oracle, MySQL, PostgreSQL, and many others.

 

Note: While many would claim that spreadsheets and Access are databases, they would not qualify as complete “database management systems” like the ones listed above.

 

Tableau Server-Based Data Source Connectors

 

The vast majority of Tableau’s server-based connectors are database management systems, but there are other types as well, including OData (a web-based open protocol that allows for the creation and consumption APIs in a simple way), Web Data Connectors (a web-based JavaScript connector custom written to allow people to connect Tableau to data), and Google Drive/Sheets. Google Sheets’s similarity to Excel may cause some confusion—why is one file-based and the other is server-based? The reason is that Excel stores its spreadsheets in files that generally reside on your computer or somewhere on your company’s file server. Google Sheets, on the other hand, does not deal with files—all spreadsheets are stored on the server and you access them via a web application.

 

Tableau Data Sources

A Tableau Data Source can consist of multiple connections and those connections can be a mix of file and server-based. For example, the following screenshot shows a single data source which contains two connections.

 

 

The first connection is a file-based connection to Excel, while the seconds is a server-based connection to SQL Server.

 

We can add as many connections as we want and we can relate/join/union them together however we wish. In the end, this combination of connections and data model has resulted in a full-blown Tableau Data Source. And, of course, a Tableau workbook can have multiple Data Sources.

 

This distinction between file-based and server-based data sources is very important as it will impact much of how you work with a data source, which we’ll deal with shortly.

 

Live vs Extract

Once you have your Tableau data source, you need to decide whether to make it Live or an Extract.

 

Live Data Sources

By default, data sources will use a live connection, except in some cases where a live connection is not possible (we’ll address this shortly). Put simply, a live connection maintains a direct connection to the source files or server at all times, so any updates to that source can be seen by Tableau pretty much immediately (Tableau does do some caching, so it’s not quite this straightforward, but for this blog, this definition is good enough).

 

When using a live connection, Tableau has to speak to the source using its native language. For most relational database management systems, that language is Structured Query Language (SQL). In such cases, SQL is used to pull in all of your data fields, execute all of the filter logic, translate calculated fields, and many other things. Unfortunately, SQL is different for each DBMS and often doesn’t have the same set of functions as are available in Tableau. For this reason, some functions are not available to all live data sources. For example, Tableau has a function called MAKEDATE which allows you to create a date from a year, month, and day. However, most DBMSs do not have a similar SQL function. In this case, Tableau cannot translate the MAKEDATE function into SQL, so MAKEDATE is not available for use with that live connection.

 

Extract Data Sources

We can choose to change our live connection to an Extract. When you choose to extract your data, Tableau creates a local file which contains a copy of the data. Previous to version 10.5, this data was stored in Tableau Data Engine file with extension TDE. Starting in 10.5, Tableau converted to the Hyper engine and extract files use the HYPER extension (I’ll primarily refer to Hyper files from here on out).

 

Note: Some data connections such as Web Data Connectors do not even allow live connections. In such cases, you are forced to create an extract.

 

A key drawback of using an extract is data freshness. The data is frozen in time and you have to force it to be refreshed (or schedule it, which we’ll deal with in part 2), whereas live connections ensure you always have the most recent data. So, why would you ever want to create a copy of your data? Why not just always us a live connection? The most common reason is performance. Tableau can very quickly read data from an extract because 1) it has been optimized to do so and 2) the extract generally includes only a subset of the original data connection’s data. For example, you may connect to a database that has 300 tables, but only use 2 of those tables in your extract. The smaller data set means more efficient interactions with the data.

 

It is important to note that extracts are not always faster than live connections. Some databases, particularly those that are built for analytics, are incredibly powerful and can retrieve data at lightning-fast speeds, making them faster than extracts. It is, therefore, helpful to understand the database platform you’re working with and how its performance compares to that of your computer or Tableau Server. This being said, in my experience, extracts are almost always faster than live connections, so if in doubt and concerned about speed, an extract is usually your best bet.

 

But performance is not the only advantage of extracts. As noted earlier, some functions cannot be translated into the native language of the connection (see the MAKEDATE example). When using an extract, all functions are available to you automatically. And this is more than just calculated field functions—there are other functionality available in extracts which may not be available with a live connection.

 

Portability is one more advantage of extracts. The resulting extracted data can be shared with others or even saved with a Tableau workbook. Of course, this may not always be an advantage as it opens you up to potential security concerns.

 

There are, of course, many other factors you must take into account when choosing between an extract or a live connection. For a deeper dive, check out the following blog by Jonathan Drummey: TDE or Live? When to Use Tableau Data Extracts (or not)

 

Live Connection to an Extract?

Now let’s do an experiment. I’m going to connect to the saved Superstore data source, then I’ll create an extract. You can see below that I’m saving the extract file as Superstore.hyper in my Downloads folder.

 

 

Now I’m going to create a new workbook and use the More option under the file-based connectors to connect to the extract file we just created.

 

 

Finally, we drag over our tables to create a data model.

 

 

We now have a file-based connection to the extract file. The original data source was file-based, but even if it had been server-based, we are now connected to a file, so this connection is file-based.

 

But there is something a bit strange here and, in my opinion, more than a little confusing. The connection type is Live! What? But this file is an extract—why is it showing as a live connection? Well, yes, our original data source was an extract, but as soon as we connected to that file, our new data source is a live connection to the file. In short, we have a live connection to an extract file (if we really wanted to, we could create an extract of the extract, though I’m not sure why you’d want to do that).

 

While the use cases for a live connection to an extract are somewhat limited, they do exist. For example, perhaps you have a Tableau Prep or Alteryx data flow that updates the extract on a regular basis? In that case, your live connection to the hyper file would automatically receive those updates as they occur. This is sort of a niche use case, but you will see this same sort of thing quite a bit when you work with published data sources in Tableau Server/Online, something we’ll address in part 2.

 

A Quick Note on Tableau Public

Tableau Public “is a free platform to publicly share and explore data visualizations online.” It is important to note that Tableau Public only supports extracts. So, you will always need to extract your data before publishing to Public. And, if you’re using the Public version of Tableau Desktop, you won’t even see the option for a live connection—it will create an extract for you automatically.

 

Up Next

Thanks for reading part 1. In part 2, we’ll discuss published data sources and we’ll break down all of the different data-source related file types that Tableau uses. In the meantime, if you have any questions or comments, feel free to leave them in the comments section below.

 

Ken Flerlage, March 21, 2022

Twitter | LinkedIn | GitHub | Tableau Public


5 comments:

  1. Thanks alot as always.

    ReplyDelete
  2. Thanks Ken! This is extremely useful, as it's not always easy to find definitions of terms in the Tableau help, which often leaves me confused and wondering what the heck all the terms mean! (For instance, that was the case with "data source" vs "connection", which your post clarified straight away.)

    ReplyDelete
    Replies
    1. Great! Glad to hear it was helpful!! Be sure to check out the other blogs in this series as well!!

      Delete
  3. BTW, just realised that although this post mentions part 2 a few times, none of those mentions are links. So when I came back to it, it took me a while to trach down part 2! ☹

    ReplyDelete

Powered by Blogger.