Unlocking the Power of Chat GPT: Enhancing Data Analysis with Automated Code Writing

by Kaitlin MacDonnell

Chat GPT, a language model developed by OpenAI, has garnered significant attention for its capabilities in generating human-like text. While it has its limitations when it comes to manipulating data, it can be effectively used as a tool for writing code, enabling data analysts to automate various tasks and expedite their workflows. In this blog post, we explore the use cases of Chat GPT for data analysts and demonstrate how it can be leveraged to write code efficiently.

Current Limitations of Chat GPT in Manipulating Data:

Before delving into the advantages of using Chat GPT for code writing, it's important to acknowledge the limitations it currently faces when working with data. Some of these limitations include:

  1. Processing large datasets: Chat GPT can be slow when handling extensive amounts of data, which can hinder its practicality for data analysis tasks involving sizeable datasets.
  2. Mathematical tasks: Chat GPT, being a natural language model, struggles with mathematical tasks due to its reliance on language patterns as it is trained on vast amounts of text data. Fundamentally it does not possess intrinsic mathematical knowledge or understanding. For example you shouldn't ask it to calculate the average from a dataset. Instead of recognising the formula to calculate an average, it will rely on the language patterns it has learned and the context of the question. This makes it good at confidently giving you an incorrect answer.
  3. Generating limited rows: The model's response is typically limited to a certain number of rows, which can be a constraint when dealing with datasets that require a larger output. This makes it poor at manipulating a dataset and outputting it as a table. It is also poor at creating mock datasets that need to be over 25 rows.
  4. Reliability of datasets: While Chat GPT can provide recommendations for datasets, it is essential to perform background checks to ensure the reliability and accuracy of the suggested data sources.

This is why when working with data, you should never single handedly rely on chat GPT to help manipulate your data. Instead, you should leverage it as a tool to write code, which in turn can manipulate large datasets and perform complex statistical analysis quickly and efficiently.

Why Use Chat GPT for Code Writing:

A barrier to using programming languages like Python for data manipulation is the requirement to learn the language. However, the investment of time and effort needed for learning may outweigh the benefits, especially considering the availability of more user-friendly tools. However, Chat GPT can help partially overcome this barrier in two ways.

Firstly, Chat GPT can act as a helpful "Study Buddy" to assist in learning Python. It can provide guidance, offer code examples, and explain programming concepts using accessible language. By interacting with Chat GPT, users can gain practical experience and gradually improve their Python skills.

Secondly, Chat GPT can enhance data analysis by simplifying complex tasks and making the data preparation process more efficient. It can generate code snippets to automate repetitive or time-consuming operations, perform intricate analysis, and provide insights on different approaches to data manipulation. This streamlines the workflow and allows analysts to focus on higher-level analysis rather than getting caught up in manual data processing.

However, it is important to stress that while Chat GPT can expedite certain aspects of data analysis, it is still crucial to have a solid understanding of the underlying processes and techniques. Chat GPT should be seen as a valuable tool that complements human expertise and judgement, rather than a substitute for domain knowledge. Maintaining a comprehensive understanding of the data analysis process is essential to ensure accurate results and make informed decisions based on the outputs generated by Chat GPT.

Despite its limitations in data manipulation, Chat GPT can be an invaluable tool for data analysts when it comes to writing code. Here are some reasons why it is an excellent choice:

  1. Multilingual capabilities: Chat GPT can understand and generate code in a variety of programming languages. It excels in languages like SQL and Python, enabling data analysts to work with the languages they are most comfortable with.
  2. Accelerated data cleaning: By utilising Chat GPT, data analysts can expedite the data cleaning process. They can ask the model to generate code snippets that perform common cleaning tasks, such as removing punctuation or converting text to lowercase.
  3. Automation through web scraping: Chat GPT can assist in automating web scraping tasks. By providing the necessary guidance, it can generate Python scripts that scrape data from websites, saving significant time and effort.
  4. Learning coding languages: Interacting with Chat GPT allows data analysts to learn coding languages in the process. As the model generates code, it provides explanations and insights, facilitating the analyst's learning journey.

Example: Using Chat GPT to Write Code:

To illustrate the practical application of Chat GPT for code writing, let's consider an example project. In this scenario, a data analyst aims to analyse sentiment in a dataset containing sentences from the book, "The Adventures of Sherlock Holmes." Here are the steps taken:

  • Installation and tutorials: I first asked Chat GPT for recommendations on installing Python and tutorials to follow for best practices. It also provided guidance on which Integrated Development Environment (IDE) to use and best practices. After downloading Anaconda, I used Spyder.
  • Data preprocessing: With the guidance of Chat GPT, I generated a Python script to remove punctuation and convert all words to lowercase, optimising the dataset for sentiment analysis.
  • Dataset manipulation: Next, I requested Chat GPT to write a script to remove the first 45 lines from the book, which were deemed irrelevant to the task.
  • Research and implementation: Chat GPT provided information on various sentiment analysis packages and their best use cases. The analyst implemented the recommended packages accordingly, getting chat GPT to write a script.
  • Exporting data: Finally, the analyst asked Chat GPT to generate a script to export the analysed data as a CSV file, and Chat GPT successfully fulfilled the request.

Because of Chat GPT I was able to effectively conduct sentiment analysis on "The Adventures of Sherlock Holmes", subsequently visualising my findings using Tableau. This is just a small example, but hopefully provides a sense of possible projects Chat GPT can assist with.

Key Takeaways when Using Chat GPT as a Study Buddy for Writing Code

Based on the experience of leveraging Chat GPT for code writing, here are some essential takeaways to consider:

Be specific when asking for a task, providing a sample dataset and relevant information.

 For example instead of asking this:

'Can you write a script to analyse a dataset on booklines?'

 Ask this:

'Here is a sample of my dataset on booklines. Here are the first few lines of my dataset including column names. Can you remove unnecessary punctuation from this dataset?'

Chat GPT likes to write out your dataset as a first step in your python script. If you have already imported the dataset and assigned it to a variable, you can tell Chat GPT to skip this step, and directly ask Chat GPT to provide code based on the assumption the dataset is available as [variable name].

If you are struggling with how to be specific when asking your question, you can even ask it for guidance on what you should provide:

Break down queries into individual steps, allowing for clear and focused interactions.

This is particularly useful when trying to learn the language. You are given a well documented script plus an explanation for what it is doing. If you are unsure of what certain parts mean, you can also ask it follow up questions.

In this example it is telling me how to remove a column. An important point however, in this instance, Chat GPT changed my requested column name 'field_1' into 'Field 1', which if not changed will lead to an error in the code. This shows you need to remain vigilant at each step, which is why a step wise process for writing a script is the best practice to ease trouble shooting.

Leverage the model's context window to remember previous details in a thread.

It isn't necessary to continue to include details in future queries, but avoid relying on extensive context for this specific use case. Sometimes it can confuse chat GPT. A key difference to highlight here between v3.5 and v4 is that v4 has a significantly larger context window. This means it tends to be better at remembering previous details in a thread.

Stay organised in your queries and troubleshooting process, starting new chats if necessary to maintain clarity.

It is easy to get tangled up in a long thread of back and forth queries, particularly if Chat GPT isn't providing the correct answer. In these cases, sometimes it is best to start a new chat to troubleshoot a specific issue.

Conclusion:

While Chat GPT has limitations when it comes to manipulating data, it can be a valuable asset for data analysts as a code-writing tool. By leveraging its multilingual capabilities, data analysts can automate tasks, accelerate data cleaning, automate web scraping, and learn coding languages. However, it is crucial to be aware of its limitations and use it in a step-wise process, making the most of its ability to generate code snippets and provide insights. As advancements in language models continue, the potential for using Chat GPT as a study buddy for data analysts will only grow.