Auto-Refreshing File-Based Data Sources in Tableau Server

 

 
Before jumping into this blog, I wanted to give a shout out to Michael Perillo for his help on this blog. Michael is a Tableau Ambassador, co-leader of the Phoenix Tableau User Group, and all-round great guy. My background is mostly in Windows so I reached out to Michael for some help ensuring that I didn’t gloss over the Linux world. While this blog is still has a Windows-bent, with Michael’s help, we’ve provided some resources to help with Linux environments as well. Michael also gave some great feedback that definitely helped to increase the quality of the blog. Thanks Michael!

From my time answering questions on the Tableau Community Forums, one question I have seen time and again is how to set up file-based data sources (Excel, csv, etc.) to automatically refresh on Tableau Server. This is particularly difficult for a number of reasons. So, in this blog, I’m going to discuss the basic steps required to auto-refresh your file-based data sources and, along the way, I’ll try to explain why all this is necessary.

File-Based Sources
Before we start, let’s define file-based data sources. These differ from database data sources in that they are physical files that reside on your computer (or some shared network-based drive). Since these are physical files, they can be emailed, moved, copied, etc. just like any other file on your computer. To connect to them in Tableau, you simply choose from the list of file connectors, then select your file.


Note: Selecting “More” will provide you with some additional options:


Databases are different as you connect to them using some sort of virtual “connection string” and, once connected, you can use any of the tables, views, etc. that are contained in the database. While databases are ultimately made up of physical files, you neither know nor care where those files are located.

In my personal opinion, it’s always best to get your data from a real database whenever possible, as those databases tend to be the “system of record” for the data. Files that have been exported from the database are out-of-date almost immediately. However, sometimes you have no choice but to work with a file-based data source. Unfortunately, as noted above, there are a number of steps required to ensure Tableau Server can automatically update these data sources. So let’s talk about those now.
 
For more on data sources, I'd recommend reading my series on Tableau Data Sources:
 
 
The 3 Keys of Auto-Refresh
There are three main things you must do when attempting to set up Tableau Server to auto-refresh file-based data sources:

1) Store the files on a network-based file share.
2) Ensure Tableau Server has access/permissions to the files.
3) Leverage the Universal Naming Convention (UNC) path of the files when connecting via Tableau.

Let’s discuss each of these separately.

Network File Share
Note: This section is focused on Windows-based file servers. If your company uses Linux-based file, these steps will be a bit different.

Most companies set up a file server which allows you to store files. One big advantage of using these file servers is that they are typically backed up frequently so that, if a file is lost or damaged, you can easily restore it. While it has also grown common for personal computers to be backed up, file server backups are generally much more resilient and reliable. Another advantage is that, with the correct permissions, file servers allow you to share files with others. And that’s the critical reason why we need to leverage them when auto-refreshing Tableau Server. Your files need to reside in a location that is both accessible to you and Tableau Server. If a file resides on your local computer then, generally, Tableau Server won’t be able to communicate with it.

Note: Strictly speaking, you can set up your computer so that Tableau Server can see it by sharing that specific file or the directory in which it resides (a shared directory is called a “file share” and it’s pretty much the same concept as is used on file servers). The problem with this is that your computer isn’t a server so it won’t be as fast, reliable, or resilient as a file server. And, in the case of laptops, we shut down our computers and take them with us at the end of the day. Tableau Server certainly cannot communicate with your computer if it’s powered down. So, while this is technically possible, I would suggest that you avoid this option at all costs as there are simply better ways to do this.

If you are going to be refreshing multiple files, then I generally recommend setting up a single location for each department or client you’re working with. You don’t want to set up one folder for everyone as those people will be able to access all of those files. But you also don’t want to set up a separate folder for each individual file, as that will require a lot of maintenance. So, work to find the right balance between manageability and security.

Note: Before moving on, I should note that there are some other alternatives to storing your files on a file share, including various cloud-based services for which Tableau has pre-built connectors.

Tableau Server Access
Note: For this section, you’ll need to know whether your Tableau Server runs on Windows or Linux, as the steps are slightly different. We’ll start with Windows-based Tableau Servers.

Now that your file resides on the file server, we need to make sure Tableau Server can access it. That’s where the Run As service account comes into play. This is basically a network user that runs the Tableau Server service. Since it runs the service, Tableau Server has the same permissions as the Run As account. So, if this account can access a file, then so can Tableau Server. Thus, we just need to ensure that the Run As account has access to the file share on which you’ve placed your file. Generally speaking, you’re going to need some help with this. Your Tableau Server administrator will need to provide the Run As username and your file server administrator will need to grant the access. Read-only access should be sufficient.

I don’t have much experience with Linux-based Tableau Servers, so that’s where I called in the help of Michael Perillo. The setup is a bit different and pretty technical, so you’re going to need to get your Tableau Server administrator on board. Be sure to clearly explain what you are trying to do. If you’re lucky, you administrator will already know the solution. If not, then the following document on provides detailed, step-by-step instructions on how to configure Tableau Server on Linux to connect to shared directories.

Use the UNC Path
Note: The following is Windows specific, but the same basic concepts should also apply to Macs.

Your computer will generally have a number of drive letters. For instance, C: is typically your hard drive and D: is usually a DVD-ROM drive. But many of your drive letters are “mapped network drives” which essentially point to a share on a network file server.

So, let’s assume that you’ve placed your files in the following folder:

N:\Departments\Sales\Tableau Server\

You open Tableau and connect to your files using this path. You build your data model then publish the data source and set up an automatic refresh. But, the first time it attempts to refresh, it fails. You’ve placed your files on a file share and you’ve ensured that Tableau Server has access to the share—so why did it fail?

The problem is that this N drive is just a sort of a pointer to another location on the network. It’s a shortcut that makes it easier for you to access your files. Generally, your co-workers will have the same set of mapped drives, so they’ll be able to access files using this path. But servers are different for a couple of reasons. First, mapped drives are user-specific, so to map those drives on the Tableau Server, someone would need to log on as the Run As account, then map those drives. Second, mapping drives simply isn’t something that is typically done on a server. That being the case, when Tableau Server tries to connect to N:\Departments\Sales\Tableau Server\, it will fail because, from its perspective, no such drive exists.

To correct this problem, we need to make sure to use the full Universal Naming Convention (UNC). A UNC path will generally start with \\ followed by the server name, then the folders. 

There are a couple of methods for translating a drive letter to the UNC. One option is to find the drive letter (in Windows, use Windows Explorer). It should look something like this:


Notice the N drive at the bottom. The first part of this, “Share”, is the name of the file share on the server. The second part, which falls within the parentheses, is the server name. To turn this into a UNC path, we start with \\, then add the server name, then the share, followed by the rest of our file path. So…

N:\Departments\Sales\Tableau Server\

becomes

\\DESKTOP-C3FKLQO\Share\Departments\Sales\Tableau Server

Another option is to open a command prompt (Windows - hold the Windows Key and press R,  then type cmd and click OK). In the command prompt, simply type net use. This should provide an output that looks something like this:

Local     Remote                  
----------------------------------------------

N:        \\DESKTOP-C3FKLQO\Share

From here, follow the same steps I documented above to replace the drive letter on your path.

To make sure you have the right path, I generally recommend that you paste this final path into Windows Explorer and make sure you see your files.

Now, in Tableau Desktop, connect to your files using the UNC path instead of the drive letter. By doing this, you’ll be ensuring that Tableau Server is able to communicate with the files when it attempts the refresh. The rest of the process is pretty much the same as any other published data source. You’ll just specify a refresh interval and Tableau Server will do the rest. And, if you’ve done everything correctly, it should refresh without any errors!

IMPORTANT 
When publishing, Tableau will give you the option to "Include External Files". This option will cause Tableau to push a copy of the file up to a temporary folder on Tableau Server, which it will then use in the future. Since we're setting up Tableau Server to automatically refresh the data source from the UNC path, we do not want to do this so we need to leave this option unchecked. 
 
Unfortunately, if your data source is embedded in your workbook (as opposed to being published separately), this could cause some problems since "Exclude External Files" is used for other purposes, such as images. So, on one hand, you need to check the box (to include the images), but on the other, you need to uncheck it (to ensure the data source remains connected to the UNC path). Fortunately, there are two solutions to this problem. One solution is to publish the data source as its own Published Data Source, leaving "Include External Files" unchecked. This will separate the data source from the workbook. When you then publish the workbook, you'll check "Include External Files" so that the images are included. Another solution is to use a UNC path when adding your images. Like our source data file, Tableau Server will then attempt to load that image from the UNC path rather than a local copy. But be sure that Tableau Server has access to that path, as detailed previously. 

Windows vs Linux
As I said earlier, that this tutorial is definitely more focused on Windows environments. If you work in a pure Linux environment or a hybrid environment (Tableau Server and File Servers are a mix of Windows and Linux), then you may require a slightly different solution.

Accessing files from Windows and Linux is akin to speaking two different but similar languages. In some cases, the words and references are the same; in other cases, they are completely different. Being bi-lingual can be a great advantage, but if you only speak one language (Windows or Linux), seek help from those who are fluent. In such cases, I highly recommend communicating with your Tableau Server and File Server administrators. If you clearly explain what you need to do, then they should be able to help you find a viable solution. And, when you do find a solution, be sure to clearly document it for the next time you need to do something similar.

In addition to involving your system admins, there are many other great resources to help you succeed, including the Tableau Help docs, Community Forums and Knowledgebase. You can also learn a lot about other types of environments by networking with other Tableau users. A great way to do this is to join your local Tableau User Group. Attending these meeting can put you in touch with local experts who work in a variety of industries and environments, which can really help you to up your Tableau game.


I hope this blog has been valuable. If you have any questions, feel free to leave them in the comments section. Thanks again to Michael Perillo for his help.
Ken Flerlage, June 22, 2020


4 comments:

  1. I have 8 MS SQL Server database that i needed to connect to, run the same query on each and then union the results. I couldn't do that with Tableau so I used MS Access to run 8 pass through queries to retrieve the data and one native Access query to union the 8 pass through queries.
    Then I had Tableau attach to the Access database and made an extract.
    No matter what I did I could not get the dashboard to refresh.
    I had to manually open the Tableau workbook, refresh the extract and republish the workbook.
    Finally I checked "include external files" and now it works. It refreshes on schedule.

    ReplyDelete
    Replies
    1. Why couldn't you union the SQL tables directly in Tableau?

      Delete
    2. 8 different sql servers each with their own database ; so 8 different databases ; each has the same tables and table structures, but each database is for a different region --
      if its possible to union them then I'd love to hear how, but everything I've found through google says its not possible (unless you do some crazy outer join work around)

      Delete
    3. Ah yes, you can't do cross-database unions. Your best option is likely to be Tableau Prep or some other type of data prep before you get to Tableau.

      Delete

Powered by Blogger.