DEV Community

AIRabbit
AIRabbit

Posted on • Edited on

Analyzing Reddit Comments with Apify and NotebookLM

This tutorial explains how to export data from any Apify Actor, like Reddit Scraper Lite, to Google Drive for analysis with NotebookLM. This integration lets you seamlessly transfer your scraped Reddit data to NotebookLM for advanced analysis, such as generating briefs, asking questions, and creating podcasts from your data.

In this short guide, we’ll export data from Reddit Scraper to Google Drive and use it for analysis with NotebookLM. For simplicity, we’ll use a keyword search to gather opinions, pros and cons, and pain points about a product called Camtasia. This serves as a placeholder for any other product or market research you might want to conduct.

Here’s how it works in a nutshell:

  • The Reddit Scraper (or any other actor) runs.
  • After it finishes, it triggers the NotebookLM actor, which exports all the data (please note the current limit of 1MB) to Google Drive.
  • You point NotebookLM to the newly exported data and start interacting with your scraped data (in this case, Reddit comments).

Step 1: Start the Reddit Scraper

Choose the Reddit Scraper Lite.

For simplicity, remove the default startUrls and enter the keyword “camtasia” as shown below.

You can also set limits, e.g., 100.


{  
    "debugMode": false,  
    "includeNSFW": true,  
    "maxComments": 10,  
    "maxCommunitiesCount": 2,  
    "maxItems": 100,  
    "maxPostCount": 100,  
    "maxUserCount": 2,  
    "proxy": {  
        "useApifyProxy": true,  
        "apifyProxyGroups": \[  
            "RESIDENTIAL"  
        \]  
    },  
    "scrollTimeout": 40,  
    "searchComments": true,  
    "searchCommunities": false,  
    "searchPosts": true,  
    "searchUsers": false,  
    "searches": \[  
        "camtasia"  
    \],  
    "skipComments": false,  
    "skipCommunity": true,  
    "skipUserPosts": false,  
    "sort": "new"  
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Integrate with the NotebookLM Actor

As shown in the picture, click on “Integrations” and choose “Connect actor or task”.

Then click “Apify To NotebookLM” and enter the dataset exactly as shown below:

{{resource.defaultDatasetId}}

Step 3: Start the Scraping Job

Now, start the scraping job as you normally would and wait until it’s finished. You can see the current jobs by clicking on the menu bar on the left under the tab “Runs”.

Step 4: Access the Latest Run and Copy the Data URL

Open the latest result for the NotebookLM actor. If everything went well, you should see a single result — the URL to a Google Doc which you’ll use in NotebookLM.

Image description

Open the result and open the URL in your browser.

You’ll be asked to copy the document to your Google Drive. Click “Make a copy”.

Now you’re ready to import the scraped and exported data into a new or existing NotebookLM instance.

Step 5: Import Data into NotebookLM

Open notebooklm.google.com and select Google Docs. From the documents, select the latest document exported by the NotebookLM Actor.

Now you can perform all sorts of analyses on your data. In our example, the objective was to understand:

  1. The main complaints from using Camtasia.
  2. The competitor landscape that users are discussing.

Note that the latter is incredibly useful if you don’t want to rely solely on product comparisons on the internet, which might be biased.

Summary

By following these steps, you’ve successfully integrated Reddit Scraper Lite with NotebookLM, enabling powerful data analysis capabilities. This integration streamlines the process of collecting, transferring, and analyzing Reddit data, allowing you to derive meaningful insights and create valuable content effortlessly.

Top comments (0)