Having studied machine learning during my data science degree, I was eager to see how it could be implemented in Alteryx. To test the process, I created an example workflow where I used a gradient boosted model to predict football players transfer values.
Overall, I was very impressed with how accessible Alteryx makes machine learning, and it is definitely a tool I will use again in the future. For anyone else who is interested in testing out Alteryx's machine learning capabilities, here is a short step-by-step guide on how to set up a model.
Step 1: Creating a training and test set
Before creating a model, it is important to establish a training and test dataset. The training dataset is used directly by the model to help it learn patterns and relationships within the data. The test dataset is then used to evaluate the model's performance and assess how well it generalizes to new, unseen data.
To do this, it is common to split the initial dataset into a 70:30 ratio. This can be achieved in Alteryx using the create sample tool and applying the following configuration.
Step 2: Set up model
The estimation sample can now be used to train a model of your choice. In this example, I have used a Boosted Model tool, which I selected because it typically performs well across most machine learning tasks and responds effectively to all data types, including categorical variables.
There are two main features that must be configured:
1) The target field should be selected as the variables you are hoping to predict.
2) The predictor fields should be selected as the variables you are using to help train your model.
(Optional: For those with more advanced machine learning knowledge, there is a "model customisation" tab. In here you can tune your model and set up cross validation.)
To get more information about the model you have built, a Browse tool can be attached to the output labelled R. This provides a basic overview of what you have built and shows the relative importance of each predictive variable. In this example, the year that a players contract expires, provides the most importance to the model and is therefore the strongest indicator of a players transfer value.
Step 3: Assess the machine learning model by predicting values using the test set
To assess final model performance, the model comparison tool can be used. To set this up, simply input the created model into the "M" input label, and input the previously created test data set into the "D" input label.
The created workflow should look something like this:
By attaching a browse tool to the "R" label of the model comparison tool a report is generated. Here you can see how your model has performed.
In this example, the model has an RMSE of nearly 5 million, meaning that on average predicted values are about 5 million pounds away from a players actual value. The accompanying MPE value, indicates that typically the predicted value is about 5.2% below the actual value. The model is therefore significantly undervaluing players.
This model was just built as a fun example to demonstrate machine learning in Alteryx, but clearly improvements can be made in the future. This can be done by:
1) Tuning the model, by modifying its underlying settings.
2) Selecting or engineering new variables that are better indicators of transfer value.
3) Trying out different machine learning models.
End of tutorial
I hope this blog has helped you to set up your first machine learning model in Alteryx!
If you have any further questions about machine learning in Alteryx or The Data School in general, please feel free to reach out here: https://www.linkedin.com/in/dan-booth/