A Whole LOD-ta Love

by Vivian Ng

[cross-posted here]

When I first started working on my HIPAA breaches dashboard, I was interested in seeing how many individuals were affected by HIPAA breaches every year. One of the first things I noticed was that the number of individuals affected in 2015 was higher than all the other years—over a whopping 110 million people!

My next thought was: Why is this number so high? I investigated by creating a text table filtering out all the other years and bringing the name of the covered entity that had been breached into the view, to make sure that I had built the original view correctly.

As you can see from the above image, HIPAA breaches impacted the most individuals in 2015 out of all the years (so far…) because one breach alone (Anthem Inc.) affected almost 80 million individuals.

What’s more is, for this section, my main takeaway was that the number of affected individuals was increasing every year. If you remove that outlier, you can see that the upward trend over the years more clearly:

Now that I had identified this one outlier, I became curious about finding other outliers, by way of identifying the largest single breach for every year.

In order to find those breaches, we’re going to use a FIXED LOD expression!

LOD stands for level of detail and tells Tableau what level to aggregate at. There are three types of LOD expressions: FIXED, INCLUDE, and EXCLUDE. We’re going to focus on FIXED LODs today, which “fix” or “stick” the level of detail at the desired level.

The LOD expression is:

{FIXED DATETRUNC('year', [Year of Breach]) : MAX([# of Individuals Affected])}

In plain English, this expression says: For every year, return the breach with the highest number of individuals. Not only will it label the 2015 outlier I found earlier, this LOD expression will also help identify whether the yearly single largest breach will be similar in magnitude to the 2015 outlier.

Let’s break it down a little further, focusing on one half of the expression at a time.

FIXED DATETRUNC('year', [Year of Breach])

This looks a bit scary, right? But that’s just because the DATETRUNC part is the syntax for the Year of Breach field when the pill is green (continuous). I didn’t even write that part of the expression myself; I just dragged the pill into the calculated field box:

I think of the field in this part of the expression (Year of Breach) as a group or category. Each of the group’s members has its own row, and the LOD expression gives Tableau instructions for what value to return for each member.

Because my dataset covers the years 2010 to 2021, there are 12 members in this group, so if I were to imagine a table, it would look like this (keep in mind that the column next to it is empty at the moment—‘Abc’ is a placeholder—because we haven’t specified what to put in there yet):

Now, let’s look at the second part of the expression:

MAX([# of Individuals Affected])

These are the instructions we will give Tableau to get what we want, which is the largest value out of all the values (# of Individuals Affected) corresponding to a member of the group (one year out of 12 for Year of Breach).

For example, Tableau is looking at all the values for # of Individuals Affected for 2015 here:

Then, Tableau will pick out the highest value corresponding to this member of the group for us.

In table form, we can now fill in the second column of the table from earlier:

To sum up, I like to begin by thinking about fixed LODs as a table, in order to imagine the columns/column titles I need, as well as the output values that will populate the new column as created by this calculated field.

Once I’ve imagined what the table needs to look like, that’s when I’ll start thinking about the LOD expression in plain English:

For every [member in the specified group/field], return the [desired value — and write out the instructions for calculating this value].

Only after all that do I start writing the actual LOD expression with proper syntax. Then I’ll check that it’s been written correctly by checking it against the data (like the breach that affected almost 80 million people).

Am I all done? For this step, yes, but the next thing to do is to think about how best to represent these outliers. For example, I could show the same chart from the beginning with and without the outliers using a parameter, via a bar-in-bar chart, etc. The possibilities are endless!