DEV Community

Evan Lin
Evan Lin

Posted on • Originally published at evanlin.com on

[n8n][Gemini] Building an AI-Powered RSS Summary System with Daily LINE Notifications

image-20251205112721295

Background

As an information-anxious engineer, I track multiple tech blogs and Hacker News every day. But manually browsing is too time-consuming, so I decided to use n8n to build an automated system: Automatically fetch webpage content when RSS updates, generate summaries with Gemini AI, save them to Google Sheets, and then push selected articles to LINE at 6 AM every day.

This project integrates multiple services:

  • 📡 RSS Feed: Subscribe to multiple information sources
  • 🕷️ Firecrawl: Fetch complete webpage content
  • 🤖 Gemini 2.5 Flash: AI automatic summarization
  • 📊 Google Sheets: Store article data
  • 📱 LINE Messaging API: Flex Message push notifications

It sounds great, but I encountered many pitfalls during the implementation. This article records the problems I encountered and the solutions.

System Architecture

The entire system is divided into two independent n8n Workflows:

Workflow 1: RSS Real-time Processing

Google Chrome 2025-12-05 11.27.59

RSS trigger → Format data → Firecrawl fetch webpage → Content preprocessing → Gemini summary → Write to Google Sheets

Enter fullscreen mode Exit fullscreen mode

Workflow 2: Daily Scheduled Sending

image-20251205112906919

Trigger at 6:00 AM daily → Read Google Sheets → Filter unsent → Take 10 items → Combine Flex Message → LINE push → Update status

Enter fullscreen mode Exit fullscreen mode

Problems Encountered During Development

Problem 1: n8n Code Node Syntax Error

I initially used ES Module syntax in the Code Node:

// ❌ Incorrect approach
export default async function () {
  const items = this.getInputData();
  // ...
}

Enter fullscreen mode Exit fullscreen mode

As a result, n8n kept reporting errors and failing to execute.

Solution: Use n8n's standard writing method, directly using $input.all():

// ✅ Correct approach
const items = $input.all();

const newItems = items.map(item => {
  // Processing logic
  return {
    json: {
      ...item.json,
      // Add fields
    }
  };
});

return newItems;

Enter fullscreen mode Exit fullscreen mode

Problem 2: Gemini API Returns MAX_TOKENS Error

After sending the request, Gemini returned this result:

{
  "candidates": [
    {
      "content": { "role": "model" },
      "finishReason": "MAX_TOKENS",
      "index": 0
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 568,
    "totalTokenCount": 867,
    "thoughtsTokenCount": 299
  }
}

Enter fullscreen mode Exit fullscreen mode

At first, I thought the input was too long, but after looking closely, promptTokenCount was only 568. The problem was with the output token limit!

It turns out that Gemini 2.5 Flash has a Thinking function, which consumes a portion of the output tokens for internal thinking. I set maxOutputTokens: 300, but thinking used 299, and the actual output was only 1 token!

Solution: Increase maxOutputTokens or disable the Thinking function:

// Solution 1: Increase output token limit
{
  "generationConfig": {
    "temperature": 0.7,
    "maxOutputTokens": 1024 // Increase from 300 to 1024
  }
}

// Solution 2: Disable Thinking function
{
  "generationConfig": {
    "temperature": 0.7,
    "maxOutputTokens": 512,
    "thinkingConfig": {
      "thinkingBudget": 0 // Disable thinking
    }
  }
}

Enter fullscreen mode Exit fullscreen mode

Problem 3: Firecrawl Fetched Too Much Content

Firecrawl fetches the entire webpage, including navigation bars, sidebars, comment sections, and other noise. Sending it directly to Gemini would waste tokens and affect the quality of the summary.

Solution: Before sending it to Gemini, clean up the content with a Code Node:

const items = $input.all();
const maxLen = 1500; // Limit the maximum number of words

const newItems = items.map(item => {
  const title = item.json.title || '';
  const raw = item.json.content || '';

  // 1. Remove noise
  let text = raw
    .replace(/```

[\s\S]*?

```/g, '') // Remove code blocks
    .replace(/`[^`]+`/g, '') // Remove inline code
    .replace(/!\[[^\]]*\]\([^)]*\)/g, '') // Remove markdown images
    .replace(/<[^>]+>/g, '') // Remove HTML tags
    .replace(/https?:\/\/\S+/g, '') // Remove URLs
    .replace(/\[([^\]]+)\]\([^)]+\)/g, '$1') // Keep link text
    .replace(/[#>*`|_~]/g, '') // Remove markdown symbols
    .replace(/\n{3,}/g, '\n\n') // Compress line breaks
    .replace(/\s{2,}/g, ' ') // Compress whitespace
    .trim();

  // 2. Cut off irrelevant content
  const cutPatterns = [
    'Leave a Reply', 'Recent Comments', 'Related Posts',
    'Share this', 'Subscribe', 'Newsletter', 'Copyright',
    '關於作者', '延伸閱讀', '相關文章', '留言'
  ];

  for (const pattern of cutPatterns) {
    const idx = text.indexOf(pattern);
    if (idx > 200) {
      text = text.slice(0, idx);
    }
  }

  // 3. Limit length, keep complete sentences
  text = text.slice(0, maxLen);
  if (text.length === maxLen) {
    const lastPeriod = Math.max(
      text.lastIndexOf('。'),
      text.lastIndexOf('!'),
      text.lastIndexOf('?'),
      text.lastIndexOf('. ')
    );
    if (lastPeriod > maxLen * 0.5) {
      text = text.slice(0, lastPeriod + 1);
    }
  }

  // 4. Compose a concise prompt
  const prompt = `Write a summary of less than 100 words in Traditional Chinese, only outputting the summary text:

Title: ${title}

Content:
${text}`;

  return {
    json: {
      ...item.json,
      prompt: prompt
    }
  };
});

return newItems;

Enter fullscreen mode Exit fullscreen mode

Problem 4: LINE Flex Message Error "message is invalid"

LINE Push Message returns an error:

A message (messages[0]) in the request body is invalid

Enter fullscreen mode Exit fullscreen mode

After checking the Flex Message JSON, I found that the title field of some articles was empty, resulting in "text": undefined. The LINE API does not accept empty text fields.

Root cause: The field name read from Google Sheets is not title, but col_1 (due to a problem with the header row settings).

Solution: Add fallback when building the Flex Message:

const items = $input.first().json.data || [];

const bubbles = items.map((item) => {
  // Correction: Check for multiple possible field names and provide default values
  const title = item.title || item.col_1 || item.link || 'No Title';
  const summary = item.summary || 'No summary content';
  const link = item.link || 'https://example.com';
  const source = item.source || 'Unknown';

  return {
    "type": "bubble",
    "size": "kilo",
    "body": {
      "type": "box",
      "layout": "vertical",
      "contents": [
        {
          "type": "text",
          "text": title, // Ensure there is always a value
          "weight": "bold",
          "wrap": true
        },
        {
          "type": "text",
          "text": summary, // Ensure there is always a value
          "size": "sm",
          "wrap": true
        }
      ]
    },
    // ...
  };
});

Enter fullscreen mode Exit fullscreen mode

API Credential Settings

Firecrawl API Key

Select Header Auth in n8n:

Field Value
Name Authorization
Value Bearer fc-your-api-key

Gemini API Key

Select Header Auth in n8n:

Field Value
Name x-goog-api-key
Value your-gemini-api-key

⚠️ Note: Gemini uses the x-goog-api-key header, not a Bearer token!

LINE Channel Access Token

Select Header Auth in n8n:

Field Value
Name Authorization
Value Bearer your-channel-access-token

Google Sheets Field Design

title link summary source created_at sent
Article Title URL AI Summary Source Publication Time FALSE

⚠️ Important: Make sure the header row is set correctly, otherwise the keys read by n8n will be in the format col_1, col_2!

LINE Flex Message Effect

The final Flex Message is in Carousel format, with one card per article:

┌─────────────────────────┐
│ 📝 DK │ ← Source tag + emoji
├─────────────────────────┤
│ Article Title │ ← Bold title
│ │
│ Summary content summary content summary │ ← 100-word summary
│ content summary content... │
├─────────────────────────┤
│ [Read Original] │ ← Button link
└─────────────────────────┘

Enter fullscreen mode Exit fullscreen mode

Different sources have different colors and emojis:

  • 📝 DK (Blue #4A90A4)
  • 🔥 HN (Orange #FF6600)
  • 🎮 Steam (Dark Blue #1B2838)
  • 🇯🇵 LY Blog (Green #00C300)

Pitfalls Summary

Problem Cause Solution
Code Node execution failed ES Module syntax incompatibility Use $input.all() standard writing method
Gemini MAX_TOKENS Thinking function consumes output token Increase maxOutputTokens to 1024
Summary quality is poor Too much webpage noise Preprocess to remove irrelevant content
LINE message invalid Flex Message has empty values Add fallback default values
Google Sheets field name error Header row not set correctly Ensure the first row has the correct field names

Development Experience

This project taught me several important lessons:

  1. Gemini 2.5's Thinking function consumes output tokens: If your output is truncated, first check thoughtsTokenCount, you may need to increase maxOutputTokens or disable thinking.

  2. n8n Code Node should use the standard writing method: Avoid using export default or this.getInputData(), using $input.all() is the most stable.

  3. Always handle null values: The data returned by the API may be missing fields, you must add a fallback when combining the output.

  4. Preprocessing is important: The cleaner the content you send to AI, the better the summary quality, and the more tokens you save.

  5. The field names of Google Sheets depend on the header row: If the key read is col_1, it means there is a problem with the header row.

This system now automatically pushes 10 selected articles to my LINE at 6 AM every day, so I can finally quickly grasp the technology trends during my commute! 🎉

References


Enter fullscreen mode Exit fullscreen mode

Top comments (0)