Making a custom distribution for mock data using Alteryx Tools

This blog will be a little long but it's worth it as the end result is a tool that you can use over and over when creating mock data flows.

I find that with tools such as Mockaroo it's quite easy to get a normal distribution but if you want something a little more unpredictable it becomes a bit more of a challenge. To save time when I need a custom distribution I created an Alteryx tool for myself which I will be breaking down in this blog.

The scenario I've created for this explanation is as follows: We have 1,512 randomly generated customer names in a table called "CUSTOMER NAMES". We want to assign each of them a country but we don't want their countries distributed evenly, we want something a little more unpredictable (shown below).

Here is the final flow we will be creating to get this distribution.

For this explanation I will be keeping the names of Alteryx tools in Bold and the names of data columns in Italics

After inputting our customer names data table, the next step is to use a Text Input tool to create our distribution ratio. I have chosen 5 countries and created 2 columns: Country & Ratio.

In the Country column I have added five countries and in my Ratio column I have created the ratio of records/rows I would like between those countries i.e. my distribution. In the flow, name this input "RATIO".

After this step we should just have 2 separate inputs in the flow: "CUSTOMER NAMES" and "RATIO" plus any Select tools used for data cleansing.

Next: Use a Summarize tool to sum up the ratio column for use in calculations further down the flow. In this case our sum of ratios is 14.5.

We then want to attach a Summarize tool to our customer names table and count the unique identifier row (in this case email) as below:

Next use an Append tool to join these 2 numbers together to allow us to use them together in a calculation.

After appending the output should be as follows.

Add a Formula tool after the append tool and divide the Count by the Sum_Ratio before multiplying that formula by 2. Multiplying by 2 allows for a little more variation in our distribution as well as ensuring we have an excess of country data to connect to the customer data, preventing any nulls. Name the new column "Multiplier".

Now you want to Append the result of this formula to your RATIO table as below.

Within the Append tool make sure your Ratio and Multiplier fields are numeric formats.

At this stage your output should look like this.

Add a Formula tool and multiply the Ratio by the Multiplier, name the new column "Total".

Attach a Generate Rows tool and set up the Condition Expression box as follows. This will ensure that for each country a number of rows equal to the number in the Total column will be generated.

Your output should now look like this:

Have a quick scroll to check that each country has an equal number of rows to the number stated in Total, in this case 1040 rows for United Kingdom.

Now drag in a Formula tool and create a new row called "Random Sort" using the formula RAND(). Now sort the data by the newly created Random sort column using a Sort tool:

It doesn't matter if the sort is ascending or descending. The combination of excess rows and the random sort ensures that even if 2 countries are set to a 1:1 ratio they will not have an identical number of records when joined back to the customer data.

Using a Select tool, remove all columns except the Country column.

Join the Country field to the Customer Data table using a Join tool and set the tool to "Join on Record Position" since we have no common fields.

The final step is to add an Output tool and output your data in whatever format you prefer, your final flow should look roughly like this:

I hope this was useful!

Author:
Ross Killington
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab