Adapting to Indeed’s Job Scraping Update: A Guide for Agencies
Learn MoreA Beginner’s Guide to APIs
03 Aug, 20163 minutesData is everywhere. Every business in the world sits on data, makes decisions on data and ultimately stays in business by using data to generate knowledge.
The problem we face today, is just how much data is available and how we access and make sense of it. Say I gave you a massive data dump of every product a supermarket has ever stocked, with price data, categories and everything else that goes with it. Then say I wanted you to find the price of a loaf of home brand white bread from that huge data set. You’d either go mad trying or wander off to play Pokémon GO instead.
We need to learn, in such circumstances, to query data, for just what we need. In the above example you have already defined the parameters of what you want to search for. Such parameters, make the rest of the data, in this circumstance, irrelevant.
Introducing APIs
An API is a term often banded about by developers—“we’ll do that through an API”—that means an Application Programming Interface. Put simply, an API is a set of pre-defined and programmed functions that in this case will return a set of data from a set of parameters.
The language that an API is coded in is also irrelevant right now. The API is an interface that a developer can make calls to, to get the data they need, if they have certain criteria to search by.
So, let’s return to the Supermarket Problem and solve it with an API. We’re assuming of course that the supermarket, let’s call it SainsCoSons, is online and has a well-maintained database of their products, like you would see on any other supermarket website.
The database has a list of products, each assigned to a category and subcategory. Each product also has a title, a description and an additional description of nutritional information. Additionally, another database table holds a list of all stores in the company, their opening times, addresses and geographical location.
An API has been programmed for the websites’ developers to allow them to search for just the data they need. The structure of their site allows them to do the following:
- Search by product name
- View a product category (such as Groceries)
- View a product subcategory (such as Fruit)
- View a product information page from any category, subcategory or search results pages
- Search for a supermarket geographically
- View store details
But what functions will I need in my API?
It depends on the functionality of the site. Think about a supermarket’s website. They have a huge menu, categorising items into groceries, toiletries, cleaning, frozen items, fridge items and so on. Then inside a category like Groceries, you might have sub categories of Fruit, Vegetables, Meat, Dairy Items and so on. The list goes on.
To show relevant sector products on the sector pages you would need to pull back just the products that belong to a certain category and sub category.
Also, we could add in a function in the API to return all products that match a given keyword. So let’s create a list of methods that we would need to call to get the correct set of data back for the page. Let’s start with the keyword search.
Searching by a Keyword
You’ve done it thousands of times before on Amazon, Sainsbury’s or YouTube. It’s perhaps the quickest way of whittling down a list of anything to just the bits you’re interested in. So, let’s add the first method to our API.
In the code, when the search is happening, the site would call the function written in pink. Notice that we have to pass some data to the API to tell it what to look for (you can see this in the second line of text in the above image). We pass the keyword that the user searched for to the API, so that it returns only items that match. Of course, the fields in the data that keywords are matched against are decided beforehand. You may want to only search for products that have that keyword in the product name, or you could search for it in the product name, description and nutrition information.
Building a Category Landing Page
Landing pages are always important. For a supermarket, these pages are an opportunity to show promotions, new products, ‘2 for…’ offers and so on. Again, a quick API method can be used to retrieve all products in a category.
This time, we have to give the API call two parameters. Firstly, we pass the category for which we want all products returned. Secondly, a Boolean value is passed, indicating whether the list of results should be returned, ordered by promotional items first or not. Passing a value of ‘true’ would indicate that you want to see promotional values first.
One step further: Categories and subcategories
On a subcategory landing page, you would want an even more refined set of results. The following method would be an example function that could be written to do this.
We still pass the parameters that were passed for the category page, but this time we add a ‘subcategory’ parameter, which would be used to search the subcategory field in the database. Again, a ‘true’ or ‘false’ value would be used to determine whether you want to see promotional products first.
But I still want to have a product page to display each product’s information…
You can still use your API or web service, but in a slightly different way. The parameters passed to the API to get data aren’t submitted by the user anymore. Each product in the database will have a Product ID and that ID number is tied to this product and only this product—it is never re-used and never appears twice.
Product information pages are nearly always dynamically generated. Looking at the URL of one of these pages will usually reveal either a hashed identification number or a unique identification string of text, which can be used to retrieve the products information. This identification number is what is passed to the API to get a products information, because it is the only way of absolutely, positively identifying this product from any other product.
The ‘id’ here is simply a number and if the database is setup correctly and is utilising a primary key, you should be able to guarantee only ever 1 result per ‘id’ number. From here, you can show the product title, description and nutrition information to the user as a dynamic product detail page.
How will users find my stores without a store locator?
You can build a function into the API again that finds stores that match a user’s search criteria.
A user could search for ‘London’ or ‘Manchester’ for a store name and choose 5, 10, 20 or more stores. Such a function on the API could find geographically where ‘London’ is and then find the 5, 10 or 20 closest stores from that point. For such a market, it’s better to allow the user to enter a text location, rather than, say, geographical co-ordinates, and then have your API do the work.
Taking it a lot further…
Ok, so you’ve got the general idea. An API is a set of pre-defined functions to quickly get a set of data for parameters that you specify. It allows you to just get a list of products that contain a certain word in the title or to only find chocolates and sweets, but how complex can API calls get? Well, taking the store locator example, you could create an advanced store locator search.
The number of parameters is endless, but such a function would be a great example of the post-processing that goes on behind an advanced search on a site. However, the more parameters, you feed your API, the smaller the returning dataset is likely to be, so it is always worth weighing up whether you need to query all of the parameters in one call.
What have we learned?
First of all, if you have a large data-set that you need to ‘surf’, an API or Web Service may be for you.
Secondly, an API or Web Service is a list of common functions for retrieving sets of data.
Finally, parameters (in any number) can be passed to an API function call to filter the data returned and remove irrelevant data.
What next?
There are millions of APIs out there for getting data. From the very obvious such as news, weather, traffic and stock markets, through to the extremely obscure such as Ragefac.es, an API for generating the perfect Meme. There’s even an API for getting you out of speaking to that person you didn’t really want to bump into in the waiting room—yes, an API for fake calling your phone.
Next time, we will take a look at some of the cooler APIs, and maybe some of the weirder ones too.