We recently started learning how to use APIs, which is something I’d been looking forward to since joining The Data School. One of my first ideas for a dashboard was one that mapped London music venues based on the types of gigs they put on. I quickly realised I would need an API to implement it. The problem was that I never properly sat down to learn how APIs actually work. I’d fall into rabbit holes and get completely lost in documentation. So this topic couldn’t have come at a better time.
What I found pretty quickly is that APIs are a lot more approachable than I expected. At the same time, I was introduced to web scraping, which is another way of collecting data from the web that comes with a different set of trade offs.
In this blog, I’ll walk through what APIs and web scraping are, how they compare, and when you might want to use one over the other.
What's an API?
An API (Application Programming Interface) is a way of communicating with a piece of software or an external service to retrieve data in a structured format, usually as JSON.
Rather than pulling data from what you see on a webpage, you’re requesting it directly from the source. You typically send a request to a specific endpoint, and the API returns organised data that’s ready to use.
When would you use one?
- You need live or regularly updated data
- There’s an official API available for the service you’re interested in
- You want a reliable and repeatable data pipeline
- You’re building something longer-term, like a dashboard or app
What are the Downsides?
- If the API doesn’t expose a field, you can’t access it
- Many APIs require API keys or tokens
- There’s often a cap on how many requests you can make
- Sometimes APIs charge for access
What's Web Scraping?
Web scraping is the process of extracting data directly from a website’s HTML. Instead of using a structured endpoint, you’re just “reading” the webpage and pulling out the elements you need. This usually involves inspecting the page, identifying patterns in the HTML and then writing code to extract specific pieces of information.
When would you use one?
- There’s no API available
- The API exists but doesn’t include the data you need
- You’re doing exploratory or one off analysis
- The data is visible on a webpage but not easily downloadable
What are the Downsides?
- If the website structure changes your scraper can break
- You may need to regularly update your code
- Not all websites allow scraping (there are ethical and legal considerations)
For me, the main takeaway has been that APIs aren’t as intimidating as they first seem. Once you understand the basic request and response pattern, they become very powerful when working with live data. Web scraping is still valuable, especially when APIs aren’t available, but it’s usually not the first choice for anything you plan to maintain over time.
Learning both has made it much easier to approach data collection problems with a bit more confidence and a clearer idea of which tool to go for.
