DEV Community

Leon Wei
Leon Wei

Posted on • Updated on • Originally published at aisaastemplate.com

Reddit Social Listening with Python

Originally published at Reddit Social Listening with Python

All source code can be found here:
https://github.com/theleonwei/reddit_bot

Introduction:
Reddit is the second-most popular website in the United States, with more than 300 million unique visitors per month.

It's also one of the most trafficked sites on the internet and has become an important part of online marketing strategy for brands across industries.

This article will show you how to programmatically set up a keyword monitor service on Reddit using Python and the PRAW library.

You will also learn how to set up this web service using Django and run it on your local machine to monitor Reddit, save the leads into a database automatically.

All of the code and step-by-step instructions are based on the assumption that you are on a  Mac.

Table of contents:

  • Setting up the Django Project with the CookieCutter template
  • Register a Reddit application and install the PRAW library
  • Keyword monitoring with regular expression
  • Persisting data to a Postgres database
  • Leads report view
  • Schedule a cron job to check Reddit periodically

Setting up the Django Project with the CookieCutter template
Cookiecutter Django is a framework for jumpstarting production-ready Django projects quickly.

To learn more, you can visit their official website on Github:

https://github.com/cookiecutter/cookiecutter-django

Step 1: install cookiecutter.
Open up your favorite terminal app (mine is iTerm2), and install the latest cookiecutter.

pip install "cookiecutter>=1.7.0"

Enter fullscreen mode Exit fullscreen mode

Step 2: start a Django project.

cookiecutter https://github.com/cookiecutter/cookiecutter-django
Enter fullscreen mode Exit fullscreen mode

Follow the instruction, answer the questions, and set up the project; here are my choices:

project_name [My Awesome Project]: Reddit Bot
project_slug [reddit_bot]:
description [Behold My Awesome Project!]: My awesome Reddit Bot Project
author_name [Daniel Roy Greenfeld]: Leon W
domain_name [example.com]:
email [leon-w@example.com]:
version [0.1.0]:
Select open_source_license:
1 - MIT
2 - BSD
3 - GPLv3
4 - Apache Software License 2.0
5 - Not open source
Choose from 1, 2, 3, 4, 5 [1]: 5
timezone [UTC]: US/Pacific
windows [n]:
use_pycharm [n]: y
use_docker [n]:
Select postgresql_version:
1 - 14
2 - 13
3 - 12
4 - 11
5 - 10
Choose from 1, 2, 3, 4, 5 [1]:
Select cloud_provider:
1 - AWS
2 - GCP
3 - None
Choose from 1, 2, 3 [1]:
Select mail_service:
1 - Mailgun
2 - Amazon SES
3 - Mailjet
4 - Mandrill
5 - Postmark
6 - Sendgrid
7 - SendinBlue
8 - SparkPost
9 - Other SMTP
Choose from 1, 2, 3, 4, 5, 6, 7, 8, 9 [1]:
use_async [n]: n
use_drf [n]: n
Select frontend_pipeline:
1 - None
2 - Django Compressor
3 - Gulp
Choose from 1, 2, 3 [1]:
use_celery [n]: n
use_mailhog [n]: n
use_sentry [n]: n
use_whitenoise [n]: n
use_heroku [n]: y
Select ci_tool:
1 - None
2 - Travis
3 - Gitlab
4 - Github
Choose from 1, 2, 3, 4 [1]:
keep_local_envs_in_vcs [y]: n
debug [n]: n
 [SUCCESS]: Project initialized, keep up the good work!

Enter fullscreen mode Exit fullscreen mode

Step 3: Install dependencies

cd reddit_bot; ls
Enter fullscreen mode Exit fullscreen mode

Those are the files and directories that we have got so far.

Procfile  locale  reddit_bot  setup.cfg
README.md  manage.py  requirements  utility
config  merge_production_dotenvs_in_dotenv.py requirements.txt
docs  pytest.ini  runtime.txt
Enter fullscreen mode Exit fullscreen mode

Step 3.1: Create a virtual environment.

reddit_bot ➤ python3 -m venv ./venv
Enter fullscreen mode Exit fullscreen mode

After, you should see a newly created venv folder.

Procfile  locale reddit_bot  setup.cfg
README.md  manage.py requirements  utility
config  merge_production_dotenvs_in_dotenv.py requirements.txt  venv
docs  pytest.ini  runtime.txt
Enter fullscreen mode Exit fullscreen mode

Step 3.2: Activate the virtual environment and install all dependencies.

source venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

Notice there is a (venv) prompt, which means you have successfully activated the virtual environment.

Note: if you have not installed a Postgres database on your  Mac, you must install it first.

Check this article on Postgres installation on Mac for more details.

Install the dependencies from the local.txt files (slightly different than production requirements as it gives you more tools for debugging and testing.)

(venv) reddit_bot ➤
(venv) reddit_bot ➤ pip install -r requirements/local.txt
Enter fullscreen mode Exit fullscreen mode

Step 3.3: Create a local database reddit_bot

(venv) reddit_bot ➤ createdb reddit_bot
Enter fullscreen mode Exit fullscreen mode

After that, start the Django web server for testing.

(venv) reddit_bot ➤ python manage.py runserver
Enter fullscreen mode Exit fullscreen mode

You will probably see something like the following.

Watching for file changes with StatReloader
INFO 2022-09-17 11:28:04,131 autoreload 17789 4335895936 Watching for file changes with StatReloader
Performing system checks...

System check identified no issues (0 silenced).

You have 28 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): account, admin, auth, contenttypes, sessions, sites, socialaccount, users.
Run 'python manage.py migrate' to apply them.
September 17, 2022 - 11:28:04
Django version 3.2.15, using settings 'config.settings.local'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
[17/Sep/2022 11:28:16] "GET / HTTP/1.1" 200 13541
[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/css/toolbar.css HTTP/1.1" 200 11815
[17/Sep/2022 11:28:16] "GET /static/css/project.css HTTP/1.1" 200 228
[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/css/print.css HTTP/1.1" 200 43
[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/js/toolbar.js HTTP/1.1" 200 12528
[17/Sep/2022 11:28:16] "GET /static/js/project.js HTTP/1.1" 200 45
[17/Sep/2022 11:28:16] "GET /static/debug_toolbar/js/utils.js HTTP/1.1" 200 4479
[17/Sep/2022 11:28:16] "GET /static/images/favicons/favicon.ico HTTP/1.1" 200 8348
Enter fullscreen mode Exit fullscreen mode

Since the database is brand new, we need to initialize it with some built-in Django tables.

(venv) reddit_bot ➤ python manage.py migrate
Enter fullscreen mode Exit fullscreen mode

Once the migration is completed, restart the server.

(venv) reddit_bot ➤ python manage.py migrate
Enter fullscreen mode Exit fullscreen mode

open your favorite browser (I am using the latest Chrome) and visit localhost:8000 make sure you see the website succesfully.

Congratulations on finishing setting up your local Django server; next, let's set up our Reddit account and install the Reddit API library: PRAW.

Register a Reddit application and install the PRAW library
We assume you already have a Reddit account to set up a Reddit developer account. If not, simply visit https://reddit.com and create one, then come back.

You must first register an application of the appropriate type on Reddit.

Then Visit https://www.reddit.com/prefs/apps/

Note: sometimes, I am having some page redirect issues when visiting the above page; if that happens to you, try visiting the following instead:

https://old.reddit.com/prefs/apps/

Scroll down and click the create another app... button

reddit monitoring tutorial | register application

For the name of your app, anything should be fine

On the app type: since we will be building something running in the backend, choose script.

And for redirect uri, enter http://localhost:8000

Then click the create app button to finish this step.

There are two tokens that we will need for our service to run:

  1. client id: the one beneath personal use script

  2. client secret: the one to the right of secret

Finally, we need to install the PRAW library and try to connect with Reddit using the secret keys from the last step.

(venv) reddit_bot ➤ pip install praw
Enter fullscreen mode Exit fullscreen mode

Next, we append PRAW to the dependency (otherwise, the service won't work when we deploy to production:

(venv) reddit_bot ➤ pip freeze | grep praw >> requirements/base.txt
Enter fullscreen mode Exit fullscreen mode

It's a common best practice not to save sensitive information such as your app secret tokens in the git repository, so let's create a new file .env at the root of your project so we can access the app secret on the local machine.

(venv) reddit_bot ➤ vim .env
Enter fullscreen mode Exit fullscreen mode

Replace the secret and the client_id with yours from the last step.

For the user agent, it's not that important. To find out exactly what your user agent is, simply go to google and type "find my user agent" and copy and paste yours into the .env file.

Next, make sure you add .env into the list of files that will not be checked into the git repository by editing the .gitignore file.

(venv) reddit_bot ➤ vim .gitignore
Enter fullscreen mode Exit fullscreen mode

And add .env to the file; after that, run the following to load the environment variables.

Test your Reddit connection. In your terminal, run the following:

(venv) reddit_bot ➤ source .env
Enter fullscreen mode Exit fullscreen mode

We also need to load the variables into Django's environment variables.

Open up the config/settings/local.py file (I am using Pycharm) and add the following:

# Reddit settings:
REDDIT_SECRET=env('SECRET')
REDDIT_CLIENT_ID=env('CLIENT_ID')
REDDIT_USERAGENT=env('REDDIT_USERAGENT')
Enter fullscreen mode Exit fullscreen mode

In the terminal

Launch the Django console

(venv) reddit_bot ➤ python manage.py shell


In [1]: from django.conf import settings

In [2]: import praw

In [3]: reddit = praw.Reddit(
   ...:     client_id=settings.REDDIT_CLIENT_ID,
   ...:     client_secret=settings.REDDIT_SECRET,
   ...:     user_agent=settings.REDDIT_USERAGENT,
   ...: )
Enter fullscreen mode Exit fullscreen mode

If you don't receive any error message, it means you have successfully created a Reddit instance through PRAW.

Next, let's run a simple task to check the connection.

In [4]: for submission in reddit.subreddit("marketing").hot(limit=10):
   ...:     print(submission.title)
   ...:
New Job Listings
Sorry if this isn't allowed, but I recently created a subreddit focused on the business side of art, and would love for people to go there and share their knowledge and experiences.
I read privacy and policies of Tiktok, IG and Other Platforms. Here’s what I learned about Social Media Platforms!
Has anyone actually worked with an impressive agency?
How to bring first customers to shop?
Facebook ad numbers don't ad up
Beginning my career in marketing, looking to go in to an Agency
I hate digital marketing - help me find a new role
Where to start for ecom store?
Making a website
Enter fullscreen mode Exit fullscreen mode

If you see something similar to the above, congratulations, you've successfully connected with Reddit's official API and retrieved the top 10 hot posts from r/marketing, congratulations!

Keyword monitoring with regular expression
Assuming you run a Facebook ad agency and your target customers are new to Facebook ads, wouldn't it be nice if you could respond to someone who has questions about Facebook ads on Reddit?

Chiming in and joining a conversation on Reddit will help:

  1. Establish your reputation as a Facebook ads expert;

  2. Spread the word about your service to the world's largest online community and drive quality traffic to your website;

  3. If you get enough votes, your response may become a backlink to improve your SEO

For this article, we will show you how to find any posts whose title contains the phrase 'facebook'.

Of course, you can continue optimizing this matching rule and develop your own solution.

Again, we first open up a Django console.

(venv) reddit_bot ➤ python manage.py shell
Enter fullscreen mode Exit fullscreen mode

Inside of the Django console:

import praw
from django.conf import settings

import re # new, the python regular expression library

keyword =  "facebook"

reddit = praw.Reddit(
     client_id=settings.REDDIT_CLIENT_ID,
     client_secret=settings.REDDIT_SECRET,
     user_agent=settings.REDDIT_USERAGENT,
)

for submission in reddit.subreddit("marketing").hot(limit=100): # we are searching 100 hottest posts on the marketing subreddit
    if re.search(keyword, submission.title, re.IGNORECASE): # new, notice we are ignoring the case sensitivity
        print(submission.title)
Enter fullscreen mode Exit fullscreen mode

Your results may differ from mine (as we ran this search on September 18th, 2022).

In [2]: for submission in reddit.subreddit("marketing").hot(limit=100):
   ...:     if re.search(keyword, submission.title, re.IGNORECASE): # new, notice we are ignoring the case sensitivity
   ...:         print(submission.title)
   ...:
Facebook ad numbers don't ad up
Facebook & Instagram Ads campaign Setup
Measuring the impact of Facebook
How many interests is too many Interest - Facebook Ads


We've found four discussions about Facebook without too much work. How awesome is it!

Let's save those results and other metadata such as URLs, post date, and content into a database so we don't lose them.
Enter fullscreen mode Exit fullscreen mode

Persisting data to a Postgres database.
Let's first create a new app

# In the root dir of your project
(venv) reddit_bot ➤ django-admin startapp reddit
Enter fullscreen mode Exit fullscreen mode

Note: we also need to move the newly created app to the reddit_bot subdirectory. This step is important due to the way cookie-cutter structured our project.

mv reddit ./reddit_bot
Enter fullscreen mode Exit fullscreen mode

Next, let's update the apps.py config file.

The default name is "reddit", we need to update it to "reddit_bot.reddit" since we have moved it from the root directory to the subdirectory.

And don't forget to include this app in the base.py config file.

Now let's open up the models.py file and add our first class.

class Lead(models.Model):
    post_id = models.CharField(max_length=10) # Original post id
    title = models.TextField()
    content = models.TextField()
    posted_at = models.DateTimeField()
    url = models.URLField(max_length=500)
Enter fullscreen mode Exit fullscreen mode

In the command line, let's install the app and model and make the migration.

(venv) reddit_bot ➤ python manage.py makemigrations
Migrations for 'reddit':
  reddit_bot/reddit/migrations/0001_initial.py
    - Create model Lead

(venv) reddit_bot ➤ python manage.py migrate
Operations to perform:
  Apply all migrations: account, admin, auth, contenttypes, reddit, sessions, sites, socialaccount, users
Running migrations:
  Applying reddit.0001_initial... OK
Enter fullscreen mode Exit fullscreen mode

Next, let's create a command line script to execute the keyword matching and save the results into the Lead model.

For this script, let's call it lead_finder.py and put it under reddit/management/commands folder.

First we need to create the two folders:

# Move to the reddit app directory
(venv) reddit_bot ➤ cd reddit_bot/reddit
(venv) reddit ➤ mkdir management
(venv) reddit ➤ mkdir management/commands

# Inside reddit_bot/reddit_bot/reddit/management/commands/lead_finder.py file
import datetime as DT
import re

import praw
from django.conf import settings
from django.core.management.base import BaseCommand
from django.utils import timezone
from django.utils.timezone import make_aware

from reddit_bot.reddit.models import Lead

KEYWORD = "facebook"
SUBREDDIT = 'marketing'

reddit = praw.Reddit(
    client_id=settings.REDDIT_CLIENT_ID,
    client_secret=settings.REDDIT_SECRET,
    user_agent=settings.REDDIT_USERAGENT,
)

def convert_to_ts(unix_time):
    try:
        ts = make_aware(DT.datetime.fromtimestamp(unix_time))
        return ts
    except:
        print(f"Converting utc failed for {unix_time}")
        return None


def populate_lead(keyword, subreddit):
    for submission in reddit.subreddit(subreddit).hot(limit=100):
        if re.search(keyword, submission.title, re.IGNORECASE):
            if not Lead.objects.filter(post_id = submission.id):
                Lead.objects.create(post_id=submission.id,
                                    title=submission.title,
                                    url=submission.permalink,
                                    content=submission.selftext,
                                    posted_at=convert_to_ts(submission.created_utc))


class Command(BaseCommand):
    help = 'Populating leads'

    def handle(self, *args, **kwargs):
        try:
            current_time = timezone.now()
            self.stdout.write(f'Populating leads at {(current_time)}')
            populate_lead(KEYWORD, SUBREDDIT)
        except BaseException as e:
            current_time = timezone.now().strftime('%X')
            self.stdout.write(self.style.ERROR(f'Populating feeds failed at {current_time} because {str(e)}'))

        current_time = timezone.now()
        self.stdout.write(self.style.SUCCESS(f'Successfully populated new leads at {current_time}'))
        return
Enter fullscreen mode Exit fullscreen mode

Some explanation:

This script has 3 parts, the convert_to_ts function converts a UNIX time to human-readable format. Reddit stored the timestamp when a post was first created in the format of a big integer.

The populate_lead uses the same logic in our last section and saves the new lead (if it has not already been saved in our table, remember, we enforced the post_id as the primary key in our Lead model definition)

Lastly, we created a Command class so that we can execute the populate_lead in a command line. There are other ways to execute a script on the command line, but this way is more of a Django style, in my opinion.

Finally, we can try to execute the script and populate some leads.

(venv) reddit_bot ➤ python manage.py lead_finder
Populating leads at 2022-09-18 17:50:35.045981+00:00
Successfully populated new leads at 2022-09-18 17:50:36.558052+00:00
Enter fullscreen mode Exit fullscreen mode

Let's open up the Django console to verify the results are saved successfully.

from reddit_bot.reddit.models import Lead

for lead in Lead.objects.all():
        print(f'''title: {lead.title}\nposted_at:{lead.posted_at}\nurl: {lead.url}\n''')



title: Facebook ad numbers don't ad up
posted_at:2022-09-17 21:14:24+00:00
url: /r/marketing/comments/xgxv9c/facebook_ad_numbers_dont_ad_up/

title: Facebook & Instagram Ads campaign Setup
posted_at:2022-09-17 08:01:38+00:00
url: /r/marketing/comments/xgghl9/facebook_instagram_ads_campaign_setup/

title: Measuring the impact of Facebook
posted_at:2022-09-16 20:49:14+00:00
url: /r/marketing/comments/xg2kbi/measuring_the_impact_of_facebook/

title: How many interests is too many Interest - Facebook Ads
posted_at:2022-09-16 01:32:28+00:00
url: /r/marketing/comments/xfdwsg/how_many_interests_is_too_many_interest_facebook/
Enter fullscreen mode Exit fullscreen mode

Here we go. All four leads persisted successfully!

Leads report view
It's cool we can see the data in the console, but it will be easier if we can view the leads in a table from a browser.

Inside the view.py file, let's create a ListView.

#inside reddit_bot/reddit_bot/reddit/views.py

from django.views.generic import ListView

from .models import Lead

# Create your views here.
class LeadView(ListView):
    model = Lead
    template_name = 'lead_list.html'

lead_view = LeadView.as_view()
Enter fullscreen mode Exit fullscreen mode

We also need to create a HTML file 'lead_list.html' inside of a new directory called templates under the reddit app.

{% extends 'base.html' %}

    {% block content %}
        <table class="table table-striped">
        <thead>
        <tr>
        <th scope="col">ID</th>
        <th scope="col">Title</th>
        <th scope="col">Posted At</th>
        <th scope="col">Content</th>
        </tr>
        </thead>
        <tbody>

        {% for lead in object_list %}
            <tr>
            <th scope="row">{{ lead.post_id }}</th>
            <td><a href="https://reddit.com{{ lead.url }}"> {{ lead.title }}</a></td>
            <td>{{ lead.posted_at }}</td>
            <td>{{ lead.content }}</td>
            </tr>
        {% endfor %}

        </tbody>
        </table>

    {% endblock %}
Enter fullscreen mode Exit fullscreen mode

Next, we need to add a URL path to access this view.

Create a new file: urls.py under the reddit app.

inside reddit_bot/reddit_bot/reddit/urls.py

from django.urls import path

from reddit_bot.reddit.views import lead_view

app_name = "reddit"

urlpatterns = [

    path("leads/", view=lead_view, name="leads"),

]
Enter fullscreen mode Exit fullscreen mode

Finally, we must include the reddit app's URLs file on the project level.

Final step: check the page, open your browser, and visit: http://localhost:8000/reddit/leads/

And if you click on the title, you will be redirected to the Reddit post page, where you can engage with your target customers. How cool is that!

Schedule a cron job to check Reddit periodically.
We are almost done, and if you are like me, we like to automate our tasks; how about we schedule the job to be run automatically?

And that's super easy.

In the terminal, type crontab -e and enter.

Add the following line (you will need to edit the path of the reddit_bot Django project)

1 * * * * cd ~/reddit_bot; source venv/bin/activate; source .env; python manage.py lead_finder >/tmp/stdout.log 2>/tmp/stderr.log
Enter fullscreen mode Exit fullscreen mode

It will run every hour at the 1 minute past that hour, for example, if now is 11:35 am, and the next time this job will run at 12:01 pm, and 13:01 pm, etc.

Of course, you can change the schedule that works best for you. You can use the following to customize your cron job.

https://crontab.guru/

Conclusion

We've gone through how to set up your own reddit keywords monitoring with python.

Building a Python/Django Startup?

Check out Django SaaS & AI Boilerplate, ship fast and start generating revenue within days, not weeks. Kickstart your success journey today.

Top comments (0)