Ads Top

Auto-Refreshing File-Based Data Sources in Tableau Server

 
Before jumping into this blog, I wanted to give a shout out to Michael Perillo for his help on this blog. Michael is a Tableau Ambassador, co-leader of the Phoenix Tableau User Group, and all-round great guy. My background is mostly in Windows so I reached out to Michael for some help ensuring that I didn’t gloss over the Linux world. While this blog is still has a Windows-bent, with Michael’s help, we’ve provided some resources to help with Linux environments as well. Michael also gave some great feedback that definitely helped to increase the quality of the blog. Thanks Michael!

From my time answering questions on the Tableau Community Forums, one question I have seen time and again is how to set up file-based data sources (Excel, csv, etc.) to automatically refresh on Tableau Server. This is particularly difficult for a number of reasons. So, in this blog, I’m going to discuss the basic steps required to auto-refresh your file-based data sources and, along the way, I’ll try to explain why all this is necessary.

File-Based Sources
Before we start, let’s define file-based data sources. These differ from database data sources in that they are physical files that reside on your computer (or some shared network-based drive). Since these are physical files, they can be emailed, moved, copied, etc. just like any other file on your computer. To connect to them in Tableau, you simply choose from the list of file connectors, then select your file.


Note: Selecting “More” will provide you with some additional options:


Databases are different as you connect to them using some sort of virtual “connection string” and, once connected, you can use any of the tables, views, etc. that are contained in the database. While databases are ultimately made up of physical files, you neither know nor care where those files are located.

In my personal opinion, it’s always best to get your data from a real database whenever possible, as those databases tend to be the “system of record” for the data. Files that have been exported from the database are out-of-date almost immediately. However, sometimes you have no choice but to work with a file-based data source. Unfortunately, as noted above, there are a number of steps required to ensure Tableau Server can automatically update these data sources. So let’s talk about those now.

The 3 Keys of Auto-Refresh
There are three main things you must do when attempting to set up Tableau Server to auto-refresh file-based data sources:

1) Store the files on a network-based file share.
2) Ensure Tableau Server has access/permissions to the files.
3) Leverage the Universal Naming Convention (UNC) path of the files when connecting via Tableau.

Let’s discuss each of these separately.

Network File Share
Note: This section is focused on Windows-based file servers. If your company uses Linux-based file, these steps will be a bit different.

Most companies set up a file server which allows you to store files. One big advantage of using these file servers is that they are typically backed up frequently so that, if a file is lost or damaged, you can easily restore it. While it has also grown common for personal computers to be backed up, file server backups are generally much more resilient and reliable. Another advantage is that, with the correct permissions, file servers allow you to share files with others. And that’s the critical reason why we need to leverage them when auto-refreshing Tableau Server. Your files need to reside in a location that is both accessible to you and Tableau Server. If a file resides on your local computer then, generally, Tableau Server won’t be able to communicate with it.

Note: Strictly speaking, you can set up your computer so that Tableau Server can see it by sharing that specific file or the directory in which it resides (a shared directory is called a “file share” and it’s pretty much the same concept as is used on file servers). The problem with this is that your computer isn’t a server so it won’t be as fast, reliable, or resilient as a file server. And, in the case of laptops, we shut down our computers and take them with us at the end of the day. Tableau Server certainly cannot communicate with your computer if it’s powered down. So, while this is technically possible, I would suggest that you avoid this option at all costs as there are simply better ways to do this.

If you are going to be refreshing multiple files, then I generally recommend setting up a single location for each department or client you’re working with. You don’t want to set up one folder for everyone as those people will be able to access all of those files. But you also don’t want to set up a separate folder for each individual file, as that will require a lot of maintenance. So, work to find the right balance between manageability and security.

Note: Before moving on, I should note that there are some other alternatives to storing your files on a file share, including various cloud-based services for which Tableau has pre-built connectors.

Tableau Server Access
Note: For this section, you’ll need to know whether your Tableau Server runs on Windows or Linux, as the steps are slightly different. We’ll start with Windows-based Tableau Servers.

Now that your file resides on the file server, we need to make sure Tableau Server can access it. That’s where the Run As service account comes into play. This is basically a network user that runs the Tableau Server service. Since it runs the service, Tableau Server has the same permissions as the Run As account. So, if this account can access a file, then so can Tableau Server. Thus, we just need to ensure that the Run As account has access to the file share on which you’ve placed your file. Generally speaking, you’re going to need some help with this. Your Tableau Server administrator will need to provide the Run As username and your file server administrator will need to grant the access. Read-only access should be sufficient.

I don’t have much experience with Linux-based Tableau Servers, so that’s where I called in the help of Michael Perillo. The setup is a bit different and pretty technical, so you’re going to need to get your Tableau Server administrator on board. Be sure to clearly explain what you are trying to do. If you’re lucky, you administrator will already know the solution. If not, then the following document on provides detailed, step-by-step instructions on how to configure Tableau Server on Linux to connect to shared directories.

Use the UNC Path
Note: The following is Windows specific, but the same basic concepts should also apply to Macs.

Your computer will generally have a number of drive letters. For instance, C: is typically your hard drive and D: is usually a DVD-ROM drive. But many of your drive letters are “mapped network drives” which essentially point to a share on a network file server.

So, let’s assume that you’ve placed your files in the following folder:

N:\Departments\Sales\Tableau Server\

You open Tableau and connect to your files using this path. You build your data model then publish the data source and set up an automatic refresh. But, the first time it attempts to refresh, it fails. You’ve placed your files on a file share and you’ve ensured that Tableau Server has access to the share—so why did it fail?

The problem is that this N drive is just a sort of a pointer to another location on the network. It’s a shortcut that makes it easier for you to access your files. Generally, your co-workers will have the same set of mapped drives, so they’ll be able to access files using this path. But servers are different for a couple of reasons. First, mapped drives are user-specific, so to map those drives on the Tableau Server, someone would need to log on as the Run As account, then map those drives. Second, mapping drives simply isn’t something that is typically done on a server. That being the case, when Tableau Server tries to connect to N:\Departments\Sales\Tableau Server\, it will fail because, from its perspective, no such drive exists.

Here's an analogy to help you understand this better. Think of a drive letter as a name for a location, like such as Ken’s House. The Tableau Server is like a digital assistant such as Siri or Alexa. If you tell Alexa to take you to Ken’s House, Alexa says “Sorry, I don’t know that.” (Good, because if Alexa did know that, I’d be a bit worried!!). However, if you tell Alexa to take you to “1234 Main Street, Anywhere, PA”, then Alexa should be able to take you there without any problems. That’s pretty much the way drive letter mappings work—the drive letter is just a simple shortcut to a fuller address.

To correct this problem, we need to make sure to use the full Universal Naming Convention (UNC). A UNC path will generally start with \\ followed by the server name, then the folders. To translate from a drive letter to the UNC, path you’ll need to find the drive letter (in Windows, use Windows Explorer). It should look something like this:


Notice the N drive at the bottom. The first part of this, “Share”, is the name of the file share on the server. The second part, which falls within the parentheses, is the server name. To turn this into a UNC path, we start with \\, then add the server name, then the share, followed by the rest of our file path. So…

N:\Departments\Sales\Tableau Server\

becomes

\\DESKTOP-C3FKLQO\Share\Departments\Sales\Tableau Server

To make sure you have the right path, I generally recommend that you paste this path into Windows Explorer and make sure you see your files.

Now, in Tableau Desktop, connect to your files using the UNC path instead of the drive letter. By doing this, you’ll be ensuring that Tableau Server is able to communicate with the files when it attempts the refresh. The rest of the process is pretty much the same as any other published data source. You’ll just specify a refresh interval and Tableau Server will do the rest. And, if you’ve done everything correctly, it should refresh without any errors!

Note: When publishing, Tableau will give you the option to "Include External Files". Since we're setting up Tableau Server to automatically refresh the data source, we do not want to choose this option.

Windows vs Linux
As I said earlier, that this tutorial is definitely more focused on Windows environments. If you work in a pure Linux environment or a hybrid environment (Tableau Server and File Servers are a mix of Windows and Linux), then you may require a slightly different solution.

Accessing files from Windows and Linux is akin to speaking two different but similar languages. In some cases, the words and references are the same; in other cases, they are completely different. Being bi-lingual can be a great advantage, but if you only speak one language (Windows or Linux), seek help from those who are fluent. In such cases, I highly recommend communicating with your Tableau Server and File Server administrators. If you clearly explain what you need to do, then they should be able to help you find a viable solution. And, when you do find a solution, be sure to clearly document it for the next time you need to do something similar.

In addition to involving your system admins, there are many other great resources to help you succeed, including the Tableau Help docs, Community Forums and Knowledgebase. You can also learn a lot about other types of environments by networking with other Tableau users. A great way to do this is to join your local Tableau User Group. Attending these meeting can put you in touch with local experts who work in a variety of industries and environments, which can really help you to up your Tableau game.


I hope this blog has been valuable. If you have any questions, feel free to leave them in the comments section. Thanks again to Michael Perillo for his help.
Ken Flerlage, June 22, 2020

13 comments:

  1. What if you're still getting the following:
    https://community.tableau.com/thread/344210

    See comment from Ryan Calovich

    ReplyDelete
    Replies
    1. Ryan, your issue looks a bit different than Kendra's as she's using Oracle. Are you using a file based data source?

      Delete
  2. Hey Ken. Thank you for the article. As the admin in my organization, I help people with this problem often. One thing usually tell people to also do is to uncheck the "include external files" when they publish. It's my understanding that if that is left checked, that a copy of the Excel document (or file) will get included in the packaged workbook (if you are using a packaged workbook I guess...?) and it will no longer be refreshing from the network drive. Just a thought.

    ReplyDelete
    Replies
    1. Great point. I intended to include that but it just slipped my mind. Thank you!

      Delete
    2. I tried this approach but keeping the data source connection live, but i dont see the data getting updated after refreshing the dashboard in server as and when i put data in the excel.

      Delete
    3. I'm curious what's driving you to use a live connection to Excel? Is this data constantly changing to where you need a live connection? Typically live connections don't make a lot of sense with file-based data sources.

      Delete
    4. Yes, you are right ken. Data in excel gets by the users when ever they are interested and it should reflect at the sametime in the dashboard.

      Basically my usecase is to give flexibility for the users to add comments for the KPIs. Since they are not impressed with comments functionality available in Tableau server, i have given an approach of doing it from excel provided in network shared drive.

      Looking forward for your suggestion.

      Delete
    5. Tableau Server caches data so that it's not constantly requerying the source. You can force a refresh by clicking the "refresh" button in Tableau Server. You can change the caching behavior using the following: https://help.tableau.com/current/server/en-us/config_cache.htm

      Delete
    6. So, what would be your suggestion on this. to go with live connection(i tired it but not refreshing the dashboard when the underlying data gets updated in the file) or to go with extract and schedule it?

      Need you help. Thank you!

      Delete
    7. What happens when users click the "Refresh" button on server? Does it refresh with updated data (when using a live connection)?

      Delete
  3. Thanks ken, really usefull blogpost.

    ReplyDelete
  4. I think this caching behavior (changing it to 0) impacts the dashboard rendering time when users open the report every time? Correct me if i am wrong.

    ReplyDelete
    Replies
    1. Yes, you'll need to be careful with this. Caches are there to help with performance, so if you limit the life of the cache, that means there will be more queries to the database, which will make it slower.

      Delete

Powered by Blogger.