Background
As an information-anxious engineer, I track multiple tech blogs and Hacker News every day. But manually browsing is too time-consuming, so I decided to use n8n to build an automated system: Automatically fetch webpage content when RSS updates, generate summaries with Gemini AI, save them to Google Sheets, and then push selected articles to LINE at 6 AM every day.
This project integrates multiple services:
- 📡 RSS Feed: Subscribe to multiple information sources
- 🕷️ Firecrawl: Fetch complete webpage content
- 🤖 Gemini 2.5 Flash: AI automatic summarization
- 📊 Google Sheets: Store article data
- 📱 LINE Messaging API: Flex Message push notifications
It sounds great, but I encountered many pitfalls during the implementation. This article records the problems I encountered and the solutions.
System Architecture
The entire system is divided into two independent n8n Workflows:
Workflow 1: RSS Real-time Processing
RSS trigger → Format data → Firecrawl fetch webpage → Content preprocessing → Gemini summary → Write to Google Sheets
Workflow 2: Daily Scheduled Sending
Trigger at 6:00 AM daily → Read Google Sheets → Filter unsent → Take 10 items → Combine Flex Message → LINE push → Update status
Problems Encountered During Development
Problem 1: n8n Code Node Syntax Error
I initially used ES Module syntax in the Code Node:
// ❌ Incorrect approach
export default async function () {
const items = this.getInputData();
// ...
}
As a result, n8n kept reporting errors and failing to execute.
Solution: Use n8n's standard writing method, directly using $input.all():
// ✅ Correct approach
const items = $input.all();
const newItems = items.map(item => {
// Processing logic
return {
json: {
...item.json,
// Add fields
}
};
});
return newItems;
Problem 2: Gemini API Returns MAX_TOKENS Error
After sending the request, Gemini returned this result:
{
"candidates": [
{
"content": { "role": "model" },
"finishReason": "MAX_TOKENS",
"index": 0
}
],
"usageMetadata": {
"promptTokenCount": 568,
"totalTokenCount": 867,
"thoughtsTokenCount": 299
}
}
At first, I thought the input was too long, but after looking closely, promptTokenCount was only 568. The problem was with the output token limit!
It turns out that Gemini 2.5 Flash has a Thinking function, which consumes a portion of the output tokens for internal thinking. I set maxOutputTokens: 300, but thinking used 299, and the actual output was only 1 token!
Solution: Increase maxOutputTokens or disable the Thinking function:
// Solution 1: Increase output token limit
{
"generationConfig": {
"temperature": 0.7,
"maxOutputTokens": 1024 // Increase from 300 to 1024
}
}
// Solution 2: Disable Thinking function
{
"generationConfig": {
"temperature": 0.7,
"maxOutputTokens": 512,
"thinkingConfig": {
"thinkingBudget": 0 // Disable thinking
}
}
}
Problem 3: Firecrawl Fetched Too Much Content
Firecrawl fetches the entire webpage, including navigation bars, sidebars, comment sections, and other noise. Sending it directly to Gemini would waste tokens and affect the quality of the summary.
Solution: Before sending it to Gemini, clean up the content with a Code Node:
const items = $input.all();
const maxLen = 1500; // Limit the maximum number of words
const newItems = items.map(item => {
const title = item.json.title || '';
const raw = item.json.content || '';
// 1. Remove noise
let text = raw
.replace(/```
[\s\S]*?
```/g, '') // Remove code blocks
.replace(/`[^`]+`/g, '') // Remove inline code
.replace(/!\[[^\]]*\]\([^)]*\)/g, '') // Remove markdown images
.replace(/<[^>]+>/g, '') // Remove HTML tags
.replace(/https?:\/\/\S+/g, '') // Remove URLs
.replace(/\[([^\]]+)\]\([^)]+\)/g, '$1') // Keep link text
.replace(/[#>*`|_~]/g, '') // Remove markdown symbols
.replace(/\n{3,}/g, '\n\n') // Compress line breaks
.replace(/\s{2,}/g, ' ') // Compress whitespace
.trim();
// 2. Cut off irrelevant content
const cutPatterns = [
'Leave a Reply', 'Recent Comments', 'Related Posts',
'Share this', 'Subscribe', 'Newsletter', 'Copyright',
'關於作者', '延伸閱讀', '相關文章', '留言'
];
for (const pattern of cutPatterns) {
const idx = text.indexOf(pattern);
if (idx > 200) {
text = text.slice(0, idx);
}
}
// 3. Limit length, keep complete sentences
text = text.slice(0, maxLen);
if (text.length === maxLen) {
const lastPeriod = Math.max(
text.lastIndexOf('。'),
text.lastIndexOf('!'),
text.lastIndexOf('?'),
text.lastIndexOf('. ')
);
if (lastPeriod > maxLen * 0.5) {
text = text.slice(0, lastPeriod + 1);
}
}
// 4. Compose a concise prompt
const prompt = `Write a summary of less than 100 words in Traditional Chinese, only outputting the summary text:
Title: ${title}
Content:
${text}`;
return {
json: {
...item.json,
prompt: prompt
}
};
});
return newItems;
Problem 4: LINE Flex Message Error "message is invalid"
LINE Push Message returns an error:
A message (messages[0]) in the request body is invalid
After checking the Flex Message JSON, I found that the title field of some articles was empty, resulting in "text": undefined. The LINE API does not accept empty text fields.
Root cause: The field name read from Google Sheets is not title, but col_1 (due to a problem with the header row settings).
Solution: Add fallback when building the Flex Message:
const items = $input.first().json.data || [];
const bubbles = items.map((item) => {
// Correction: Check for multiple possible field names and provide default values
const title = item.title || item.col_1 || item.link || 'No Title';
const summary = item.summary || 'No summary content';
const link = item.link || 'https://example.com';
const source = item.source || 'Unknown';
return {
"type": "bubble",
"size": "kilo",
"body": {
"type": "box",
"layout": "vertical",
"contents": [
{
"type": "text",
"text": title, // Ensure there is always a value
"weight": "bold",
"wrap": true
},
{
"type": "text",
"text": summary, // Ensure there is always a value
"size": "sm",
"wrap": true
}
]
},
// ...
};
});
API Credential Settings
Firecrawl API Key
Select Header Auth in n8n:
| Field | Value |
|---|---|
| Name | Authorization |
| Value | Bearer fc-your-api-key |
Gemini API Key
Select Header Auth in n8n:
| Field | Value |
|---|---|
| Name | x-goog-api-key |
| Value | your-gemini-api-key |
⚠️ Note: Gemini uses the x-goog-api-key header, not a Bearer token!
LINE Channel Access Token
Select Header Auth in n8n:
| Field | Value |
|---|---|
| Name | Authorization |
| Value | Bearer your-channel-access-token |
Google Sheets Field Design
| title | link | summary | source | created_at | sent |
|---|---|---|---|---|---|
| Article Title | URL | AI Summary | Source | Publication Time | FALSE |
⚠️ Important: Make sure the header row is set correctly, otherwise the keys read by n8n will be in the format col_1, col_2!
LINE Flex Message Effect
The final Flex Message is in Carousel format, with one card per article:
┌─────────────────────────┐
│ 📝 DK │ ← Source tag + emoji
├─────────────────────────┤
│ Article Title │ ← Bold title
│ │
│ Summary content summary content summary │ ← 100-word summary
│ content summary content... │
├─────────────────────────┤
│ [Read Original] │ ← Button link
└─────────────────────────┘
Different sources have different colors and emojis:
- 📝 DK (Blue #4A90A4)
- 🔥 HN (Orange #FF6600)
- 🎮 Steam (Dark Blue #1B2838)
- 🇯🇵 LY Blog (Green #00C300)
Pitfalls Summary
| Problem | Cause | Solution |
|---|---|---|
| Code Node execution failed | ES Module syntax incompatibility | Use $input.all() standard writing method |
| Gemini MAX_TOKENS | Thinking function consumes output token | Increase maxOutputTokens to 1024 |
| Summary quality is poor | Too much webpage noise | Preprocess to remove irrelevant content |
| LINE message invalid | Flex Message has empty values | Add fallback default values |
| Google Sheets field name error | Header row not set correctly | Ensure the first row has the correct field names |
Development Experience
This project taught me several important lessons:
Gemini 2.5's Thinking function consumes output tokens: If your output is truncated, first check
thoughtsTokenCount, you may need to increasemaxOutputTokensor disable thinking.n8n Code Node should use the standard writing method: Avoid using
export defaultorthis.getInputData(), using$input.all()is the most stable.Always handle null values: The data returned by the API may be missing fields, you must add a fallback when combining the output.
Preprocessing is important: The cleaner the content you send to AI, the better the summary quality, and the more tokens you save.
The field names of Google Sheets depend on the header row: If the key read is
col_1, it means there is a problem with the header row.
This system now automatically pushes 10 selected articles to my LINE at 6 AM every day, so I can finally quickly grasp the technology trends during my commute! 🎉
References
- n8n Documentation
- Firecrawl API
- Gemini API Documentation
- LINE Messaging API - Flex Message
- Google Sheets API



Top comments (0)