Tableau Data Sources Part 2: Published Data Sources & More

 

Welcome back to my two-part series which deals with the ins and outs of Tableau data sources. If you missed Tableau Data Sources Part 1: Data Source Types, then be sure to go back and read it first as I’ll be building upon some of the concepts shared there. We’re going to start part 2 by diving into published data sources.

 

Published Data Sources

So far, we’ve just been dealing with data sources in relation to Tableau Desktop, but things get even more complicated when we introduce Tableau Server or Tableau Online. We can publish data sources to Tableau Server/Online in two ways. We can either publish the workbook with the data source included (an Embedded Data Source) or we can publish the data source as its own standalone object (a Published Data Source). I typically recommend publishing as a standalone published data source because that data source can then be used in other workbooks or by other authors—essentially, it enables data source reuse, which helps to ensure consistency and reduces pressure on the source connections. I’m going to focus on published data sources for the rest of this section.

 

To create a published data source, you first create your data source in Tableau Desktop (or Prep or other tool). As you might expect, the data source can be either a Live connection or an Extract. You then connect to your Tableau Server/Online and use the menu option, Server | Publish Data Source (Note: Tableau Online will often require the additional component, Tableau Bridge). This creates a data source object on server which anyone with permissions can use.

 

Connecting to a Published Data Source

You can connect to a published data source in Tableau Desktop by selecting the Tableau Server option from the connection screen.

 

 

You can then search to find the data source you’re looking for. In the screenshot below, I’ve connected to a published data source, which is an extracted data source I created from the Superstore Excel file.

 

 

But, as I shared in the previous blog, the connection type shows as “Live.” Why? Yes, our original data source was an extract, but we have essentially created a new data source that connects to that extract, and that new data source is a live connection. In short, we have a live connection to an extracted data source. Every time you connect to a published data source, it will show as a live connection, regardless of whether or not the published data source is a live data source.


Warning: You cannot relate or join a published data source to any other data. If you need to combine multiple published data sources then you will need to leverage blending or you will need to create a new data source that connects to the original data sources and relates or joins them.

 

Refreshing an Extracted Published Data Source

When you publish a live data source to Tableau Server/Online, that data source maintains a live connection to the data, so refreshing it is not necessary. However, extracts, by their nature, are just snapshots of data and typically need to be refreshed on a regular basis. This is where scheduled refreshes come into play. When a refresh process runs, it will connect to your original data (this could be a file or a server, such as a database management system), it will pull the data it needs, then update the data source on the server. I won’t be going into much more detail about the process of refreshing data sources as that is pretty well documented in the Tableau Help.

 

I do, however, want to mention a few differences you’ll see depending on the type of connections you use. The process is relatively straightforward when using a server-based connection—Tableau will prompt you to provide details about how to authenticate the data source, then that’s all you need. Tableau Server/Online should generally be able to connect to that server data source without any issue. One common problem I see is that firewalls or similar technology sometimes block communications between the two servers. If you see network-related errors when you attempt to refresh a data source using server-based connections, then I recommend working with your IT team to determine if there are any firewalls getting in the way.

 

File-based data sources are a bit trickier because they often reside on your computer or on a mapped drive and Tableau Server/Online will generally have problems connecting to those. If you’d like to learn more about this, see my blog, Auto-Refreshing File-Based Data Sources in Tableau Server.


What’s Included in a Published Data Source

When you publish a data source, it will include all groups, sets, hierarchies, bins, and calculated fields that you’ve created. When you are connected to a published data source, these objects, plus all the data fields, cannot be edited because they are part of the data source (note: there are a few properties, such as geographic role, which can be changed and will then be saved as part of the workbook).

 

You can create copies of these objects or create new groups, sets, hierarchies, bins, and calculated fields, but those will not be part of the published data source. Instead, those will become a sort of extension to the data source which reside only in your workbook. This can be pretty confusing so my recommendation is that you first edit the data source (see the next section), then add the new objects, and republish. This will ensure that others can leverage those new objects, while also avoiding confusion caused by having some objects in the data source and some in the workbook.

 

Parameters are a bit odd because they are not specific to a data source—they can be used by any data source to which you are connected. When you publish a data source, the parameters will become part of the data source, but when you connect to that data source, the parameters can still be edited and can be used in other data sources.

 

Editing a Published Data Source

Editing a published data source can be a bit tricky at first, so it’s important to understand the process. Kevin included this in his recent tips blog, Ten Tableau Tips & Techniques - Round 4 (tip # 7), so I’m not going to go into any further detail about it here. You can also read about the process in the Tableau Knowledgebase, Editing a Published Data Source On Tableau Server

 

Contents of a Published Data Source

Like workbooks, you can download a published data source which will give you a packaged data source file (TDSX). A packaged data source is really just a zip file—by changing the extension to ZIP, we can open it up and see its contents. For example, here are the contents of an extracted published data source:

 

 

First, we see a data source file (TDS), which is an XML containing the metadata about the data source—essentially, a file that describes the data source, but does not contain any actual data.

 

Next, we see a Data folder, which contains another folder, Extracts (because this is an extracted data source). Inside the Extract folder is an extract file (HYPER). This is the extract file we created in Tableau Desktop initially (Note: If your published data source was created before version 10.5 and the data source was never upgraded to Hyper, then your file will have a TDE extension instead of HYPER).

 

So, how does it look when we download a data source which uses a live connection? In this case, it will depend on whether you’re using a file-based connection or server-based.

 

If using a file-based connection, you’ll see something like this:

 

 

Once again, we see the data source (TDS) file and a Data folder. The Data folder contain another folder with the same name as the folder in which your original data source resides. In my case, I originally connected to a file in my Downloads folder. Within that folder, you’ll see a copy of the file source. I had connected to the Superstore Excel file so I see Sample - Superstore.xlx. So, basically, the packaged data source contains a copy of the original data file. Thus, if you were to connect to this packaged data source in Tableau Desktop, you’d be able to read the data and work with it.

 

If you’re using a server-based data source, then there is no way for Tableau to put a copy of the data into the data source (without extracting it, of course). Thus, you’ll only see a data source (TDS) file and no Data folder. If you use Tableau Desktop to connect to the packaged data source, it will force you to re-enter the credentials for the server-based connection before you can use the data.

 

Workbooks

OK, now you’re asking why I have a workbooks section in a blog about data sources. The reason is that, in some cases, workbooks actually contain the data itself. In order to explain what I mean, we need to first look at some of the different types of workbook files, like we did when we took apart the packaged data source files in the previous section. When we download a workbook from Tableau Server/Online, we get a packaged workbook (TWBX). Like packaged data source files, packaged workbook files are just zip files. We can change the extension to ZIP in order to open them. The below example shows the contents of a packaged workbook using live connection to the Superstore Excel file:

 


What we see in these files is very similar to what we see in packaged data source files—a Data folder which organizes our data files in exactly the same manner as the packaged data source (with all the same differences for live vs extract and file vs server-based connections). We’ll also see a workbook file (TWB), which is an XML file containing the metadata of the workbook (similar to the TDS file found in the packaged data source). This workbook file has the same name as the name of the packaged workbook (“Example” in my case). In addition, we may also see some other folders with objects needed for the workbook, such as images. In short, the packaged workbook is a fully self-contained file which has everything we need—the workbook, the data, and any other objects we might need (images, etc.).

 

An Experiment…

Okay, time for an experiment…I’ll start by creating a new workbook in Tableau Desktop then creating a live data source connecting to the Superstore Excel file located in my C:\Downloads directory. Next, I’ll save this as a packaged workbook (TWBX).

 

Remember what’s happening here—the packaged workbook will contain the workbook file (TWB) as well as a copy of the source Excel file.

 

Now let’s close the workbook and open it back up. Next, go to the Data Source tab and edit the connection to the Excel file.

 

 

Take note of the directory in which the Excel file resides. In my case, it’s C:\Users\Ken\AppData\Local\Temp\TableauTemp\3296623394\Data\Downloads. You may be wondering why it isn’t linked to the C:\Downloads directory I originally connected to. The reason is that, once I save the file as a packaged workbook (TWBX), it copies the Excel file into the packaged workbook. Then, when I open it, Tableau retrieves the data from the packaged workbook and places it into a temporary directory. Once I save as a packaged workbook, that link to the original file is broken. This is incredibly important to remember because, if you go back and edit the original Excel file (in C:\Downloads) then try to refresh the data source in your workbook, it will never update. This, of course, is because it is no longer linked to that file.

 

In this case, if you simply save the workbook as a workbook (TWB), then you’ll only be saving the XML metadata for the workbook. Reopening that file will maintain the connection to the original source file. However, this isn’t always the right approach. If you intend to share the workbook with others, they will need to have that data available to them. This is why we always ask for a packaged workbook (TWBX) on the Tableau Community Forums. If you provide us with a workbook file (TWB) only, then we won’t be able to see your data and it’s next to impossible to address a problem without having access to the data.

 

Recap

Wow that was a lot to take in!! Data sources are, in concept, quite simple, but there are so many options and so many ways to store them that they can become very complex very quickly. I hope that this blog series has helped to demystify some of this complexity. If you have any thoughts, comments, or additional questions, please drop them in the comments. Thanks for reading!!

 

Ken Flerlage, April 11, 2022

Twitter | LinkedIn | GitHub | Tableau Public








15 comments:

  1. Hi Ken. Thanks for the detailed description on published data sources. I have blended excel file. How to publish blended excel file to Tableau server/ tableau online?

    ReplyDelete
    Replies
    1. I don't fully understand. Can you clarify the question?

      Delete
  2. Thanks for this amazing brain dump. This is going in my technical help notebook!

    ReplyDelete
  3. @Priti you need publish data sources to server/online separately then blend on server/online.

    ReplyDelete
  4. Thank you, ken. Great explanations!

    ReplyDelete
  5. Hi Ken, thanks for the blog post! data sources in tableau seems to have their own special quirks sometimes :)

    I have one question regarding published data sources.

    say I have a published data source, which is just being connected to the database, which in turn only includes the raw data.

    now, say i create a dashboard out of it, dozens of calculations, sets and etc. I would have to include the data source as also embedded data source within the workbook, so all the calculations would be materialized in order to promise the best performance (using extract for performance reasons), am I correct?

    On the flip side, if the published data source is an Extract and i don't embed the data source within the workbook , it still wouldn't include all the calculations that i would create for the dashboard, correct? or am I missing something?

    I'm just trying to find the right strategy when working with published data sources. I would love to hear your thoughts on the matter.


    Thanks again for the post and for your time!

    ReplyDelete
    Replies
    1. If you create the calculations before you publish the DS, then they will be included in the published DS. After connecting to a published DS, any new calcs you create will be part of the workbook, not the data sources.

      Delete
  6. Hi Ken - when connected to a published data source I understand it does show that 'live' connection to the published extract. I am wondering if that live connection to the extracted data slows down the dashboard at all. Would it increase performance to have a localized extract instead?

    ReplyDelete
    Replies
    1. It would likely be a little slower to connect to the server than a local file, but I doubt it would be too noticeable.

      Delete
  7. Hi Ken, we have created two published data sources - each are extracts with multiple joins within them. We want to now join these two published data sources (actually published extracts) in Prep. But doesn't seem t be working out. Is this even possible. Or do we have to make the join before we create the separate extracts.

    ReplyDelete
    Replies
    1. You can join two published data sources in Prep. Happy to help further if you'd like to email me. flerlagekr@gmail.com

      Delete
  8. I'm having a really difficult time working with a published virtual connection. I connect to that virtual connection with Tableau Desktop, pull in the tables I want, build the relationships between them, build my workbook and all seems fine. However, then I publish my workbook to Tableau Server and suddenly no data is returned because my relationships I built are all invalid now and I have to rebuild them and fix a pile of issues because of fields that are considered invalid or interpreted as not existing in the database. It's like on the Desktop side the naming convention of the fields are slightly different. Ie. a field in my relationship called "Pk" isn't being recognized on the Tableau Server side because it only seems to recognize is as being called [Pk]. Any idea if this is caused by how the virtual connection was created or perhaps how I have created the relationships, or how I published my workbook to the server?

    ReplyDelete
  9. Hey, What if I don't want to include the calculated fields in my workbook in my published data source

    ReplyDelete
    Replies
    1. You can publish the published data source without calculated field. Then connect to that published data source and create the calculated fields within the workbook.

      Delete

Powered by Blogger.