I've spent the past week testing out some of the use cases and capabilities for AI, within the role of a data analyst. I spent the first half of the week building a crash course in R using AI. My second challenge was to see how well AI can actually write R code, specifically building a predictive model.
Is AI designed to code?
In general, whilst AI certainly has the ability to write code, it is not yet designed to write complex code. There are lots of AI tools designed to generate code which you can explore here. These work on processes such as natural language processing, where AI understands human language and converts it to code, or machine learning, where AI is fed more and more data over time, and becomes better at writing code in response to what it has learnt from the data it was fed.
There are plenty of pros and cons when using AI to write code. On one hand, AI is much better equipped to quickly identify errors in code than a human can - if you've ever coded before, you'll understand the pain of having to search through lines and lines of code for a missing comma. In this sense, AI has the capacity to help developers be more efficient and productive. However, it shouldn't be relied upon to write code, or at least not yet. AI code often contains errors and bugs, and there is also a big concern in terms of security, especially when the data in question contains sensitive information. An additional issue of relying on AI to code is the limited creativity and innovation this may bring. If you consider that AI works by predicting responses based on existing data structures and information, then it's easy to see how it would lack the capacity to generate original ideas without any human assistance.
Overall, it seems that AI coding is still in its early days, and shouldn't be relied upon. However, when used as a complementary/additional resource, it has the potential to improve accuracy and increase efficiency.
In light of all this, I wanted to see what sort of output I could get from AI, to see how I could use it to assist my own coding ventures.
From finding the data to predictive statistics - using AI to write R
Thinking back to my uni days, when I would use R within a regression analysis, as part of a research project in which I would have to create hypotheses, plan my research methods, then interpret my results and draw conclusions, I wanted to see if I could get Chat GPT (an AI tool) to do all of this for me instead.
I started out by asking Chat GPT to find me a good dataset to run a logistic regression on, to which it provided me a few different popular datasets. After choosing one on survivors of the Titanic, I then asked Chat GPT to come up with hypotheses about the dataset which I could test, and write up some research methods to go along with them. I found that Chat GPT did a good job here - coming up with sound hypotheses, although I think this was massively aided by the fact the dataset I chose is a popular choice for regression analysis, and there are plenty of findings out there.
After coming up with my hypotheses and how I was going to test them, I asked Chat GPT to write me the code to do this in R. I did find that I had to go backwards and forwards a bit to get Chat GPT to write me code that actually worked, but managed to get there in the end. I managed to produce a few chi-squared analyses, and logistic regression models, without having to write a single bit of code from scratch (just making a few edits here and there). I then fed the results of these analyses back into Chat GPT, and asked it to interpret the results which it did very well.
To double check the results I was getting from these models, I actually rebuilt them in Alteryx, and sure enough, ended up getting exactly the same output - so all good here!
I then went about testing these same steps out in Bard (another AI tool), to see what results it would give me. I found that while it gave me the same dataset examples at Chat GPT, it came up with widely different hypotheses. In fact, the hypotheses that it came up, whilst interesting, were actually not possible to test using the Titanic dataset. I was interested to see what Bard would come up with when asked to write code to test this impossible hypotheses, and the results really highlighted the fact that you should not rely on AI to write you code. The hypothesis Bard came up with was ‘Passengers who were able to board lifeboats early were more likely to survive than passengers who were not able to board lifeboats early’ which relied on a variable 'time to board lifeboat' that simply did not exist. In the code that Bard produced, it created this as a new variable, by dividing a variable on the fare of a ticket, by another variable on a passenger's class. This made absolutely no sense, but really highlighted the difference in terms of ability between Bard and Chat GPT.
I was really impressed by how easily I could produce a pretty coherent academic style paper based on a regression analysis in R, all by using AI, specifically Chat GPT. I found Chat GPT wrote code fairly well, and was also pretty good at trouble shooting it when I ran into issues. All of what I produced in R, I already knew how to do which definitely helped in terms of understanding what Chat GPT was doing, but it meant that the process was much quicker than if I had been doing it alone, and I found that Chat GPT interpreted my results far better than I could have done, off the top of my head.
I also found that when building charts in R, Chat GPT was really useful for re-writing existing code, if you wanted to make formatting changes to it without having to rewrite the code yourself.
On the other hand, it was interesting to see how Bard handled the same issues, and the errors that it ran into. This got me thinking about the use cases for AI in terms of writing code. I do think it is a valuable tool in terms of assisting those who already have some experience coding, but I worry that it will prevent those yet to learn things like R doing so from scratch, instead relying on things like AI to code for them without actually understanding the theory behind it.
All of this made me think about what I would have done had I had Chat GPT when I was at university, and it does worry me a little, especially given the results aren’t always right. I feel like for students, people without background knowledge of a certain topic and potentially people seeking to spread disinformation, that AI has the capacity to create misinforming or just simply wrong output, that then may be taken at face value with negative consequences, especially given how convincing some of the output is.
In conclusion, AI can code, but it makes mistakes. Without prior knowledge of a programming language it would be difficult to critically analyze the code written by AI, and judge whether or not it was actually doing what you wanted it to do. In this sense, AI is best used to assist and supplement coding.