Tableau Prep flow and documentation conventions

If you were excited while reading this title, I may just need to call the police on you for being too nerdy for society. If you weren't at all excited, well done; you are a normal functioning human being. While I write this from prison for the same crime, I must urge you, whether excited or not, the importance of documentation in general but in this instance, in Tableau Prep specifically.

When drafting a flow, some experimentations and redrafting is almost inevitable, but whether the final draft or if just starting to explore data, the need for documentation is still palpable. Documentation will help to rationalise and clarify each step of a flow. When brining in a new step into a flow, a name is given to this new step, dependent on the type of step it is:

But this name is hardly descriptive of what a user will do within that step. In example, creating new rows but on what fields, using which fields, or a clean step, but is this just to view the data or to transform the data somehow. There are two answers to this dire confusion: Naming steps (which should be mandatory actions from any responsible Tableau Prep user) or a description (which I am more flexible about).

Now a question you may be asking is "What if I am doing several things in one step?" and to that I say "WRONG". This nicely leads us to the second, but possibly, more important part of documentation: separating steps on specific actions. There are several way to approach this. For example, changing all date fields can be done in one step named 'Changing dates', or renaming fields can be one step or even a full change of one field can be one step named 'Clean [Field 1]'. The motivation of creating several clean steps instead of one consolidated clean step is for the sustainability of the flow and for better troubleshooting if a bug or problem occurs later in the flow. Tableau Prep may be more forgiving than Alteryx due to the changes pane on the left of the view but this forgiveness should not be overestimated. For example take this issue:

Where did this error come from?

Now look at a consolidated clean step with this issue:

Where does the error happen? Who knows?

And now a well documented and clarified flow with the same issue:

Now imagine if the flow was twice as long, or even ten times as long, how much harder would it be to find and fix the issue if it only had one clean step and no documentation.

To further labour my point, let me take you through the Preppin' Data challenge 2022, week 34 for a good example of documentation.

Step 0 - Input

Do I need to say anymore about this? Well, maybe. If you have several inputs, make sure to differentiate the names. Sourcing online inputs may also be useful.

Step 1 - View

We all love to see things and look at things. Why is data any different? In this case, my rule would be 'look and don't touch'. This clean step should be just that: clean. Look at your data, and be able to refer to what it looked like before your restless meddling.

Step 2 - Simple Clean-up

This is for very minor changes. Maybe you're feeling very pedantic and you don't like the field names look, or maybe you have future plans and would like field names to make human sense for later reference. This step is for the prep, before the prep.

Step 3 - Split

Sometimes splits are bad: a split from your one true love, splitting your trousers on the first day at work, splitting your lips on a cold day. But sometimes they are necessary.

Step 4 - Rename

After a split comes some rationalizing renaming. Fields should make sense to the user. F5 - Split 2 may sound cool and snazzy but I have no clue what the field is without a rational name.

Step 5 - A Calculation

Now for good measure, we will title-case the Music Type field. No easy feat as Tableau Prep does not have an easy title case functions. But I followed fellow blogger here.

Step 6 - Creating Filtering Parameters

A more general blog on Tableau Prep parameters may be in the works but until then, have a look at my changes and documentations instead.

Step 7 - Rank and Top N

As in Alteryx, we need to rank by before we filter to a top N. Top N is set by the user so a parameter is used again.

Step 8 - Output

The final step to output work and bring the workflow to completion.

Through this challenge, I hope you will be able to pull out what makes good documentation and even improve on the standard I am trying to set myself to. In a time pressurized project or piece of work, this may be hard but all the more important. Like referencing on a essay, it will only be more torturous to do it after the main body of work is completed.

LT/DR: Document your work!

Author:
Ozlem Sigbeku
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab