Using Python to Scrape the Meet-Up API

#python #sql

We recently posted some ideas for projects you could take on, to add to your resume and help you learn more about programming.
One of those projects involved scraping the Meet-up and Eventbrite APIs to create an aggregate site of events.
This is a great project and it opens up the opportunity to take on several concepts. You could use this idea to make an alerting system — the user inputs their API keys to track local events they have an interest in. You could develop a site to predict which live acts will become highly popular, before they get there, by tracking metrics over time.

Honestly, the APIs give a decent amount of data, even to the point of giving you member names (and supposedly emails too, if the member is authenticated). It’s a lot of fun — you can use this data for the basis of your own site!

To Start:

To start this project, break down the basic pieces you will need to build the backend. More than likely you will need:

An API “Scraper”
Database interface
Operational database
Data-warehouse (optional)
ORM

To start out you will need to develop a scraper class. This class should be agnostic of the specific API call you’re making. That way, you can avoid having to make a specific class or script for each call. In addition, when the API changes, you won’t have to spend as much time going through every script to update every variable.

Instead, you’ll only need to go through and update the configurations.
That being said, we don’t recommend trying to develop a perfectly abstract class right away. Trying to build a perfectly abstracted class that has no hard-coded variables, from the beginning, can be difficult. If anything goes wrong or doesn’t work then it is harder to debug because of the layers of abstraction.

We’ll start by trying to develop pieces that work.

The first decision you need to make is where the scraper will be putting the data. We’re creating a folder structure in which each day has its own folder.

You could use a general folder on a server, S3, or a similar raw file structure. These offer the ability to easily store the raw data that we’re storing in a JSON file. Other data storage methods, like csv and tsv, are thrown off by the way the description data is formatted.
Let’s took a look at the basic script. Think about how you could start better configuring and refactoring the codebase to be better developed.

import requests
import time
import json
import sys
import codecs
import csv


class MeetUpScraper:

    api_call_type=""
    config_file="meet_up_config.json"

    def get_results(self,params,config_data):
        request=requests.get(config_data[self.api_call_type]['api_endpoint'],params=params)
        data=request.json()
        return data

    def main(self,p_config_file):
        cities=[("Seattle","WA")]
        api_key="APIKEY"

        for (city,state) in cities:
            per_page=200
            results_we_got = per_page
            offset=0
            while(results_we_got==per_page):

                response = self.get_results(
                {"sign":"true","country":"US","city":city,"state":state,"radius":10,"key":api_key,"page":per_page,"offset":offset}
                ,p_config_file
                )
                time.sleep(1)
                offset+=1
                data={}
                results_we_got = response['meta']['count']
                data = response['results']
                export_file= open("data/data_"+self.api_call_type+"_"+str(offset)+".txt","w")
                json.dump(data,export_file)
                export_file.close()

    def __init__(self,api_call_type):
        self.api_call_type=api_call_type
        config=open(self.config_file)
        config_data=json.load(config)
        self.main(config_data)

MeetUpScraper("get_event") #for testing

One place right off the bat is the API key. While you’re testing it’s easy to hard-code your own API key. But if your eventual goal is to allow multiple users to gain access to this data then you will want their API keys set up.

The next portion you will want to update is the hardcoded references to data you are pulling. This hard-coding limits the code to only work with one API call. One example of this is how we pull the different endpoints and reference what fields you would like to pull from what is returned.
For this example, we are just dumping everything in JSON. Perhaps you want to be very choosy — in that case, you might want to configure what columns are attached to each field.

For example:

{
"get_group":
{
  "api_endpoint":"http://api.meetup.com/2/groups"
},
"get_event":
    {
      "row_list":
        ["country", "city", "created", "rating", "description", "rating", "join_mode", "members", "lon", "lat", "id", "state","urlname"],
      "insert_script":

      "INSERT into raw_meet_up_3 (country, city, created, rating, description, rating_2,join_mode, members, lon, lat, id, state,urlname) VALUES( %s )",
         "api_endpoint":"http://api.meetup.com/2/open_events"
    }
}

This allows you to create a scraper that is agnostic of which API event you will be using. It puts the settings outside the code, which can be easier to maintain.

For example, what happens if Meet-up changes the API endpoints or column names? Well, instead of having to go into 10 different code files you can just change the config file.

The next stage is creating a database and ETL, to load and store all the data, and a system that automatically parses the data from the JSON files into an operational style database. This database can be used to help track events that you might be interested in. In addition, creating a data warehouse could help track metrics.

Perhaps you’re interested in the rate at which events have people RSVP, or how quickly events get sold out.

Based on that you could analyze what types of descriptions or groups that quickly run out of slots.

Personally, there is a lot of fun analysis you could take on.
Over the next few weeks and months, we’ll be working to continue developing this project. This includes building a database, maybe doing some analysis, and more!

We hope you enjoyed this piece!

If you enjoyed this video about software engineering then consider these videos as well!
The Advantages Healthcare Providers Have In Healthcare Analytics
142 Resources for Mastering Coding Interviews
Learning Data Science: Our Top 25 Data Science Courses
The Best And Only Python Tutorial You Will Ever Need To Watch
Dynamically Bulk Inserting CSV Data Into A SQL Server
4 Must Have Skills For Data Scientists
What Is A Data Scientist