DEV Community

Cover image for Build an AI Research Archivist with n8n: Stop Researching the Same Topics Twice
Alex Retana
Alex Retana

Posted on

Build an AI Research Archivist with n8n: Stop Researching the Same Topics Twice

Build an AI Research Archivist with n8n: Stop Researching the Same Topics Twice

The $15K Problem You Didn't Know You Had

Picture this: It's Tuesday morning, and you're diving into researching authentication patterns for your new microservices architecture. You spend two hours reading articles, comparing approaches, and documenting your findings in a scattered collection of browser tabs and sticky notes.

Fast forward three months. A colleague asks about authentication strategies. You vaguely remember researching this, but where did you save those findings? What were the key takeaways? You end up starting from scratch.

Studies show that knowledge workers waste nearly 6 hours per week duplicating research efforts. For a developer making $80K annually, that's roughly $15,000 in wasted productivity every year. Multiply that across a team, and the numbers become staggering.

The solution isn't another note-taking app—it's an intelligent system that actively prevents duplicate research by checking what you've already investigated before conducting new searches.

What We're Building

In this tutorial, you'll build a Research Archivist Agent using n8n that:

  • Checks your existing research archive before conducting new searches
  • Uses Perplexity AI for high-quality research synthesis
  • Automatically stores findings in Google Sheets with proper citations
  • Maintains searchable keywords for easy retrieval
  • Guides users through a structured research workflow

Tech Stack:

  • n8n (workflow automation)
  • Anthropic Claude Sonnet 4.5 (agent orchestration)
  • Perplexity AI (research tool)
  • Google Sheets (knowledge archive)

Prerequisites

You'll need:

Cost estimate: ~$5-10/month for API usage with moderate research volume.

Step 1: Set Up Your Knowledge Archive

Create a new Google Sheet with these columns:

Document Name | Document Content | Reference Link | Research Date | Keywords
Enter fullscreen mode Exit fullscreen mode

Why this structure?

  • Document Name: Human-readable identifier for quick scanning
  • Document Content: Summary of findings (not full articles)
  • Reference Link: Source URL for verification
  • Research Date: Helps identify outdated research
  • Keywords: Enables semantic search across topics

Save the Sheet URL—you'll need it for the n8n workflow.

Step 2: Import the n8n Template

  1. Download the template from the GitHub repository
  2. In n8n, go to WorkflowsImport from File
  3. Select Archivist Agent Template.json

You'll see seven nodes connected:

Chat Trigger → Archivist Agent → Claude Model
                        ↓
              [Simple Memory]
                        ↓
        ┌───────────────┴───────────────┐
        ↓                               ↓
  Perplexity Tool            Google Sheets Tools (x2)
Enter fullscreen mode Exit fullscreen mode

Step 3: Configure Credentials

Anthropic API

  1. Click Anthropic Chat Model node
  2. Create credential → Enter your API key
  3. Ensure model is claude-sonnet-4-5-20250929

Perplexity API

  1. Click Message a model in Perplexity node
  2. Create credential → Enter your API key
  3. Keep model as sonar-pro for best research quality

Google Sheets

  1. Click either Google Sheets node
  2. Create credential → Select OAuth2
  3. Follow Google's authorization flow
  4. Paste your Sheet URL in both nodes:
    • Get row(s) in sheet
    • Append or update row

Step 4: Understanding the Agent System Prompt

The core intelligence comes from the system prompt in the Archivist Agent node. Here's what makes it work:

## Workflow Process

### Phase 1: Initial Check
When a user requests research:
1. Search existing archive using "Get row(s) in sheet"
2. If found, present existing research
3. Confirm if user wants updated information

### Phase 2: New Research
If no existing research found:
1. Conduct research using Perplexity AI
2. Summarize findings
3. Store in archive
4. Provide summary to user

### Phase 3: Archive Management
- Search and retrieve specific topics
- Update entries when needed
- Organize content
- Remove duplicates
Enter fullscreen mode Exit fullscreen mode

This three-phase approach ensures you never research the same topic twice unless you explicitly need updated information.

Step 5: Test Your Agent

  1. Click Save and Activate the workflow
  2. Click the Chat button (webhook icon on the trigger node)
  3. Try these test queries:

First research request:

Research the benefits of edge computing for web applications
Enter fullscreen mode Exit fullscreen mode

The agent will:

  1. Check the archive (empty for first run)
  2. Conduct Perplexity research
  3. Store findings in your Sheet
  4. Return a summary

Duplicate check:

What do we have on edge computing?
Enter fullscreen mode Exit fullscreen mode

The agent will:

  1. Find your previous research
  2. Present existing findings
  3. Ask if you want updated research

Step 6: Advanced Configuration

Adjust Memory Window

The Simple Memory node stores conversation context. Default is 15 messages. Increase for longer research sessions:

contextWindowLength: 30  // stores last 30 messages
Enter fullscreen mode Exit fullscreen mode

Customize Research Depth

In the Perplexity node, adjust for different research needs:

// Quick facts
model: "sonar"

// Deep research (recommended)
model: "sonar-pro"
Enter fullscreen mode Exit fullscreen mode

Add Search Filters

Modify the Google Sheets search node to filter by date:

// Only search research from last 6 months
filter: "Research Date >= DATE(2024, 4, 1)"
Enter fullscreen mode Exit fullscreen mode

Real-World Usage Patterns

Daily Standup Research

"What research do we have on our current sprint topics?"
Enter fullscreen mode Exit fullscreen mode

Technical Decision Making

"Compare our previous research on GraphQL vs REST APIs"
Enter fullscreen mode Exit fullscreen mode

Onboarding New Developers

"Find all research related to our authentication architecture"
Enter fullscreen mode Exit fullscreen mode

Knowledge Transfer

"What did we learn about database sharding last quarter?"
Enter fullscreen mode Exit fullscreen mode

Troubleshooting Common Issues

Problem: Agent researches instead of checking archive first

Solution: Verify Google Sheets credentials and that the Sheet URL includes the sheet tab name


Problem: Perplexity returns generic results

Solution: Craft more specific queries. Bad: "web security" Good: "OWASP top 10 mitigation strategies for Node.js REST APIs"


Problem: Duplicate entries appearing

Solution: Use consistent naming conventions. Create a naming guide:

  • ✅ "JWT Authentication Best Practices"
  • ❌ "jwt auth", "JWT stuff", "authentication research"

Scaling Your Archive

As your knowledge base grows, consider these enhancements:

1. Add Tagging System
Add a "Tags" column with comma-separated values:

Tags: authentication, security, nodejs, jwt
Enter fullscreen mode Exit fullscreen mode

2. Create Research Templates
Define standard research formats for common topics:

  • Technical Comparisons: Pros, Cons, Performance, Cost
  • Tool Evaluations: Features, Integration, Community, Pricing
  • Best Practices: Pattern, When to Use, Common Pitfalls

3. Implement Version Control
Track research updates by adding columns:

Version | Last Updated By | Change Summary
Enter fullscreen mode Exit fullscreen mode

Extension Challenge: Build a Weekly Digest

Ready to level up? Here's your challenge: Create an automated weekly research digest that emails you a summary of all research conducted in the past week.

Hints:

  1. Add a Schedule Trigger node that runs weekly
  2. Query Google Sheets for entries from the last 7 days
  3. Use Claude to generate a formatted summary
  4. Send via Gmail or SendGrid node

Bonus points:

  • Include most-searched keywords
  • Highlight research gaps (topics with old data)
  • Add "Related research suggestions" using Claude

Share your solution! Post your workflow to the n8n community or tweet it with #n8n and tag me—I'd love to see what you build.

Why This Matters

Personal Knowledge Management isn't just productivity theater—it's a competitive advantage. When you can instantly recall research insights from six months ago, you make faster decisions. When your team shares a searchable knowledge archive, you eliminate duplicate work and accelerate onboarding.

The Research Archivist Agent isn't just a tool—it's a mindset shift from "search and forget" to "research once, reference forever."

Next steps:

  1. Clone the repository
  2. Set up your workflow today
  3. Research your first topic
  4. Watch your knowledge compound

Three months from now, you'll have a valuable archive of research that would have otherwise been lost to browser history and forgotten bookmarks.

What will you research first?


Found this helpful? Drop a ❤️ and share it with your team. Have questions or improvements? Drop them in the comments below—I read and respond to every one.

Top comments (0)