How To Create A Tableau Data Extract Using Python
I have been learning Python for a while now, and as the deeper I am going, the more curious I am getting about how Tableau and Python together can do dataMagic.
Tableau has an Extract API which lets you create Data Extracts (tde files) with C/C++/JavaScript/Python for Tableau from any external data-source like csv files, SQL, Excel . This got me thinking. Because possibilities and benefits are huge. Okay, now let's talk technical stuff.
The Requirement
The requirement is to read the iris machine learning dataset published on UCI website in the csv format, convert it into Tableau Extract and save it on your system so that you can read this data directly from your Tableau Desktop. And all of this needs to be done in Python, of course!
Step 1: Install Python and Other Prerequisites
You are going to need Python (I used Python 2.7) on your computer with pandas and dataextract libraries. Here is a good tutorial for installing Python and dataextract library by Tableau. You can install pandas by following instructions from here.
Step 2: The Coding Part
The process is pretty straight forward and follows this linear path:
Read and fetch some data from the url
Add headers to the data
Create blank tableau extract
Create extract schema: define columns with correct datatype
Fill extract with the data : one by one row
Save and close
Create a folder on your desktop named 'this_is_my_first_extract' and save this script in that folder:
Call the script from command prompt or shell depending on your OS.
The Explanation
Beware! I am totally a Python newbie doing my experiments and documenting as I do. Do not take my words for an expert advice when it comes to Python. Now go ahead:
Of course, you need to import pandas and dataextract packages as you begin because you are going to use pandas to import and hold the data in Python and dataextract to create a Tableau extract.
Once you have imported the packages you need, set the url where the data is. Also, create headers in the data and import data into Python.
At this stage, data is in the Python memory and you need to convert that into a Tableau extract. So you begin with creating a blank extract. And then create a schema which defines each column in your extract with the correct datatype. In this example, we have 5 columns in iris dataset
sepal-length: this is a decimal field so we defined datatype DOUBLE for this field
sepal-width: this is a decimal field so we defined datatype DOUBLE for this field
petal-length: this is a decimal field so we defined datatype DOUBLE for this field
petal-width: this is a decimal field so we defined datatype DOUBLE for this field
class: this is a text field so we defined datatype CHAR_STRING for this field
Please note, defining correct datatypes is important to generate right extracts so be careful at this stage. Tableau Extract API supports 9 different datatypes.
Now connect we will connect the blank data extract that we created with the schema in step 5 of the code. And ran a for loop through each line in the dataset and filled the extract row by row using Row function of dataextract package:
newRow = tde.Row(dataSchema)
Also, make sure to add the row after each iteration of the loop:
table.insert(newRow)
The last line of script will close the extract in Python: dataExtract.close()
This is it. You will have a Tableau extract named irisExtract.tde in the folder 'this_is_my_first_extract'
Hope you had fun along. If you face any difficultly or point out any error in the code, tweet me.