DEV Community

AIRabbit
AIRabbit

Posted on

Summarize Reddit Comments with Apify

A core component of effective marketing is data intelligence. However, the sheer volume of data, sourced from APIs, social media platforms like Facebook, and online forums such as Reddit, presents significant challenges.

Many large companies utilize sophisticated technologies, such as big data and MapReduce systems, to handle this vast influx of information. But, the implementation of these technologies can be prohibitively expensive for small and medium-sized companies or startups that require periodic yet comprehensive marketing research.

To make this happen, we have developed SummaryXL for Apify and APIs (Closed Beta).

What is SummaryXL

Essentially, this API is designed to analyze substantial volumes of data from virtually any API, website, or file, and to generate a summary based on specific user input. For instance, if you wish to identify negative sentiments expressed in Reddit posts about your brand, you could run a Reddit scraper tool on Apify with a and then integrate SummaryXL with just a few additional clicks.

The resulting data is automatically analysed according to your chosen topic in a matter of minutes, which would have taken hours, if not days, with human effort.

The easiest way to try out SummarizeXL is by using the playground.

Tutorial

In this example we will conduct a basic trend analysis on Reddit Posts and Comments using the keyword "GPT."

The resulting summary will focus exclusively on relevant trends, omitting any extraneous information. This targeted output is incredibly valuable for informed business decision-making. It eliminates the need to sift through every comment manually or to rely on a small sample, which might lead to missing significant feedback or insights that could have a material impact.

Image

Image

Sounds interesting?

Let's see how it works on Apify.

If you don't know what Apify is, it is one of the biggest web scraping platforms, with a huge ecosystem of scrapers for every website you could imagine—Reddit, Facebook, Twitter, and much more.

And here is how the example works in a nutshell:

  1. You rent an actor for Reddit
  2. You add an integration of the summarizing actor (SummaryXL) and add the parameters Focus Topic and the length of the Summary
  3. Start the actor
  4. Evaluate the results

Step 1: Configure the Reddit Scraper

https://console.apify.com/actors/oAuCIx3ItNrs2okjQ/input

Image

{
    "debugMode": false,
    "includeNSFW": true,
    "maxComments": 10,
    "maxCommunitiesCount": 2,
    "maxItems": 30,
    "maxPostCount": 30,
    "maxUserCount": 2,
    "proxy": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ]
    },
    "scrollTimeout": 40,
    "searchComments": false,
    "searchCommunities": false,
    "searchPosts": true,
    "searchUsers": false,
    "searches": [
        "gamma.app"
    ],
    "skipComments": false,
    "skipCommunity": true,
    "skipUserPosts": false,
    "sort": "new"
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Add Integration of SummaryXL

https://console.apify.com/actors/oAuCIx3ItNrs2okjQ/integrations/options

Image

Enter SummaryXL in the search and choose the SummaryXL actor.

Image

Enter the required information.

Notably here is the OpenAI key, which you can get here.

If you want to set the focus for the summary, you can enter something useful here—for example, if you're looking for certain sentiments or something special in the bulk of the data to be summarized.

Leave the dataset ID default as is.

Image

Image

If you would like to have the summary by email, enter the email of your Apify account (must be identical for security reasons).

Please note that this is an experimental feature and might not work all times. If you encounter any issues, please email us. Thanks.

Step 3: Run the actor

Now run the actor (Reddit). The integration will run automatically when the interaction is finished.

Step 4: Evaluate the Results

You should be able to see something like this—only one dataset (the summary) of the SummaryXL actor.

Image

The results are saved in two locations: as a dataset item (JSON) and as a PDF export in the key-value store.

https://console.apify.com/storage/datasets

Usually, the latest dataset should include your summary as shown below.

Image

Image

Wrap-Up

In summary, using SummaryXL with Apify enables you to efficiently analyze and summarize large volumes of social media data. By integrating SummaryXL with Apify's Reddit scraper—or any other actor—you can automatically generate concise summaries focused on your specific topics of interest.

This approach is invaluable for making informed business decisions without the need to manually sift through countless comments or risk overlooking critical insights.

Top comments (0)