DEV Community

Chris Power
Chris Power

Posted on • Originally published at browntreelabs.com

Scraping Reddit's API in NodeJS with Snoowrap

I'm still working on my side-project where I'm gathering information around the web. I'm eventually going to use this information in a weekly aggregate newsletter for Real Estate Investing and Property Management. If you're curious, The Newsletter is Here. For this part of the project, I'm going to scrape some of Reddit's API to find interesting Real Estate and Landlord Posts.

The Tooling

There is only one package you need to successfully scrape the reddit API in NodeJS: snoowrap.

Snoowrap is a "fully featured javascript wrapper for the Reddit API" -- quote taken from the github repo's index page. Snoowrap is really great, and it allows you to query posts, comments, scores, etc...

All of the responses are wrapped in their own little objects as well, and its all fairly well documented. Also, if you're using an IDE like Webstorm, you can easily auto-complete the functions and classes because of really great type definitions in the project.

Installing snoowrap

Install Snoowrap just like any other npm package in NodeJS:

npm install snoowrap --save

and require it:

var snoowrap = require('snoowrap');

Setting up Snoowrap

Before making any calls to the Reddit API, you have to go through an initial setup for oauth2 to generate an app, and tokens. This is fairly straightforward, but requires a few steps.

  • go to https://not-an-aardvark.github.io/reddit-oauth-helper/ and note the redirect URL you must use when creating your reddit app (the thing you use to call the API). As of this writing, the URL is: https://not-an-aardvark.github.io/reddit-oauth-helper/
  • go to https://www.reddit.com/prefs/apps/ and create a new app. It should generally look like this:

New Web App on Reddit

Note the redirect URI


  const r = new snoowrap({
    userAgent: 'A random string.',
    clientId: 'Client ID from oauth setup',
    clientSecret: 'Client Secret from oauth setup',
    refreshToken: 'Token from the oauth setup'
  });

The Script for querying RealEstate subreddit

Now that you're all set up with snoowrap (great job, you smart developer you). You can query reddit's API in NodeJS with a script similar to the one below:

import snoowrap from 'snoowrap';

export async function scrapeSubreddit() {
  const r = new snoowrap({
    userAgent: 'A random string.',
    clientId: 'Client ID from oauth setup',
    clientSecret: 'Client Secret from oauth setup',
    refreshToken: 'Token from the oauth setup'
  });

  const subreddit = await r.getSubreddit('realEstate');
  const topPosts = await subreddit.getTop({time: 'week', limit: 3});

  let data = [];

  topPosts.forEach((post) => {
    data.push({
      link: post.url,
      text: post.title,
      score: post.score
    })
  });

  console.log(data);
};

Conclusion

The โ˜๏ธ script above outputs the top 3 posts from Reddit's RealEstate API. Pretty neat right? I thought this was a fun experience, and I really love how Snoowrap works. Now I can use this data to flesh out the newsletter I'm making, again, if your curious, you can check it out here.

Thank you, have a nice day!

Discussion (0)