Background
While maintaining the linebot-arxiv project, I've always wanted to provide users with more in-depth paper analysis features. The existing "Translate Abstract" function can only process the Abstract section of a paper, but users often need to understand more: research methods, experimental results, chart analysis, and other complete content.
In the past, to achieve this function, you had to:
- Download the PDF to the local server
- Upload to Google Cloud Storage (GCS)
- Use the Gemini API to analyze files on GCS
This process is not only complex (requires managing storage space), but also involves additional costs (GCS storage fees + traffic fees), which is too heavy for a simple LINE Bot.
But in January 2025, Google announced a major update in the official blog: Gemini API now supports reading files directly from Public URLs! This means we can skip GCS and directly let Gemini analyze PDF papers on arXiv.
This change brings not only technical simplification, but also huge advantages in terms of cost and maintenance.
Screen Display
Old Interface
- "Know More" - Displays detailed paper information
- "Translate Abstract (Longer)" - Only translates the Abstract
- "Save Article" - Add to Favorites
New Interface
- "π Detailed Information" - Clearer labels
- "π AI Analyze PDF" - New Feature! Deep analysis of the full PDF
- "πΎ Save Article" - Visual improvements
Analysis Result Display
Complete structured analysis, including paper overview, research methods, key findings, and application value.
Main Repo https://github.com/kkdai/linebot-arxiv
Problems Encountered During Development
Problem 1: Limitations of the Old Implementation
Before implementing the paper analysis function, the existing approach was as follows:
// β Old version - can only process Abstract text
func actionGPTTranslate(event *linebot.Event, values url.Values) {
url := values.Get("url")
result := getArticleByURL(url)
// Can only get the text content of the Abstract
sumResult, err := GeminiChat(
fmt.Sprintf(PROMPT_Summarization, result[0].Summary.Body)
)
// Return the translation result...
}
This method has obvious limitations:
β Shallow Content - Can only analyze the Abstract, unable to delve into the main body of the paper
User: "What experimental methods did this paper use?"
Bot: [Cannot answer, because the Abstract usually does not detail the method specifics]
β Unable to understand charts - The Abstract is plain text, unable to analyze the charts and data in the paper
β Lack of complete context - Unable to understand the experimental design, result analysis, and discussion content
Problem 2: Past Solutions Were Too Heavy
To have Gemini analyze the full PDF, the only method in the past was:
# β Old method - requires GCS
1. Download PDF to local
pdf_data = requests.get(pdf_url).content
2. Upload to GCS
blob = bucket.blob(f"papers/{paper_id}.pdf")
blob.upload_from_string(pdf_data)
3. Use Gemini to analyze files on GCS
file = genai.upload_file(f"gs://bucket/papers/{paper_id}.pdf")
response = model.generate_content([file, prompt])
The problems with this method:
π° Increased Costs
- GCS storage fees: $0.020 per GB/month
- GCS outbound traffic: $0.12 per GB
- Need to manage file lifecycle (when to delete?)
π§ High Complexity
- Need to set up a GCS bucket
- Need to manage permissions (Service Account)
- Need to handle upload failures, clean up old files, and other logic
β±οΈ Increased Delay
- Download PDF: 1-3 seconds
- Upload to GCS: 1-2 seconds
- Gemini analysis: 10-20 seconds
- Total: 12-25 seconds
For a LINE Bot, such an architecture is really too heavy!
Problem 3: Confusion in Using the New API
When I saw the announcement on the Google official blog, I immediately wanted to integrate this function. But when I looked at the documentation, I found some confusion:
JSON format of the official documentation:
{
"parts": [
{
"fileData": {
"mimeType": "application/pdf",
"fileUri": "https://arxiv.org/pdf/2301.12345.pdf"
}
}
]
}
But how to write it in the Go SDK?
First attempt (based on JSON speculation):
// β Compilation error
parts := []genai.Part{
genai.FileData{
MIMEType: "application/pdf",
FileURI: pdfURL, // Error!
},
}
The result is an error:
./gemini.go:80:4: unknown field FileURI in struct literal of type genai.FileData
It turns out that the field name in the Go SDK is not FileURI, but URI! This is a small difference between the SDK design and the API documentation.
Correct Solution
1. Google's New Feature: File API with Public URLs
According to the Google official announcement, Gemini 2.5 Flash and newer models now support:
| Feature | Old Version (GCS only) | New Version (Public URL) |
|---|---|---|
| File Source | Must upload to GCS | β Supports Public URL |
| File Size | Maximum 20MB | β Maximum 1GB |
| File Type | PDF, Image, Video | β Same support |
| Additional Cost | GCS storage + traffic | β No additional cost |
| Configuration Complexity | Requires GCS bucket | β No additional configuration required |
| Processing Delay | Download + Upload + Analysis | β Direct analysis |
Supported formats:
- β PDF (maximum 1GB)
- β Images (PNG, JPEG, WebP)
- β Videos (MP4, MOV, AVI)
- β Audio (MP3, WAV, FLAC)
Restrictions:
- β οΈ Must be a publicly accessible URL
- β οΈ Does not support URLs that require authentication
- β οΈ Does not support localhost or intranet URLs
For arXiv papers, this feature is simply perfect! All paper PDFs are public URLs, without any additional processing.
2. Cost Comparison Analysis
Let's calculate the actual cost difference:
Old Method (Using GCS)
Assuming 1000 papers are analyzed per month, and each PDF is an average of 5MB:
Storage cost:
- Total storage: 1000 papers Γ 5MB = 5GB
- Monthly fee: 5GB Γ $0.020 = $0.10
Traffic cost:
- Gemini reading: 1000 papers Γ 5MB = 5GB
- Monthly fee: 5GB Γ $0.12 = $0.60
Management cost:
- Development time for automatic cleanup scripts
- Monitoring storage space
- Handling upload failure logic
Total: $0.70 + development and maintenance costs
New Method (Using Public URL)
Additional cost: $0.00
Management cost: $0.00
Total: $0.00
Save 100% of the additional cost! And there is no need to manage GCS at all.
3. Implementing a PDF URL Conversion Tool
First, I added a URL processing function in arxiv.go:
// ConvertToPDFURL converts the arXiv abs URL to a PDF URL
func ConvertToPDFURL(absURL string) (string, error) {
// First normalize the URL to ensure it is a valid arXiv URL
normalizedURL, err := NormalizeArxivURL(absURL)
if err != nil {
return "", err
}
// Extract the paper ID from the normalized URL
paperID := getIDfromURL(normalizedURL)
if paperID == "" {
return "", errors.New("failed to extract paper ID from URL")
}
// Return the PDF URL
return "https://arxiv.org/pdf/" + paperID + ".pdf", nil
}
// IsArxivPDFURL checks if it is a valid arXiv PDF URL
func IsArxivPDFURL(link string) bool {
parsedURL, err := url.Parse(link)
if err != nil {
return false
}
// Check if it is arxiv.org and the path contains /pdf/
if parsedURL.Host != "arxiv.org" {
return false
}
// Check if the path matches the /pdf/xxxx.xxxx.pdf format
re := regexp.MustCompile(`/pdf/\d{4}\.\d{4,5}(v\d+)?\.pdf`)
return re.MatchString(parsedURL.Path)
}
Why restrict support to arXiv only?
Security considerations:
- β arXiv URLs are all trusted academic sources
- β Avoid being misused to analyze arbitrary PDFs
- β Focus on the core functions of the project
Cost control:
- β Limit the scope of use
- β Predictable API usage
4. Implementing the Gemini PDF Analysis Function
Add a new function in gemini.go:
// GeminiPDF: Analyze PDF from Public URL and return the result
func GeminiPDF(pdfURL string, prompt string) (string, error) {
ctx := context.Background()
client, err := genai.NewClient(ctx, option.WithAPIKey(geminiKey))
if err != nil {
log.Println("Failed to create Gemini client:", err)
return "", err
}
defer client.Close()
// Use the gemini-2.5-flash model
model := client.GenerativeModel("gemini-2.5-flash")
value := float32(ChatTemperture)
model.Temperature = &value
// β
Key: Use the FileData structure to handle Public URLs
parts := []genai.Part{
genai.FileData{
MIMEType: "application/pdf",
URI: pdfURL, // Note: It's URI not FileURI!
},
genai.Text(prompt),
}
log.Println("Begin processing PDF from URL:", pdfURL)
resp, err := model.GenerateContent(ctx, parts...)
if err != nil {
log.Println("Failed to generate content from PDF:", err)
return "", err
}
log.Println("Finished processing PDF...")
return printResponse(resp), nil
}
Implementation Highlights:
- Correct field name
// β Error
FileURI: pdfURL
// β
Correct
URI: pdfURL
- MIMEType setting
MIMEType: "application/pdf" // Clearly tell Gemini this is a PDF
- Temperature parameter
value := float32(ChatTemperture) // 0.2 to ensure stable analysis results
model.Temperature = &value
5. Designing Professional Analysis Prompts
In order to make the analysis results more professional and structured, I designed this Prompt:
const PROMPT_PDFAnalysis = `Please analyze this arXiv paper in Traditional Chinese, including:
π **Paper Overview**
- Research topic and purpose
π¬ **Research Methods**
- Used technologies and methods
π‘ **Key Findings**
- Key results and contributions
π― **Application Value**
- Practical applications and impact
Please present it in a clear and professional manner, and respond in Traditional Chinese.`
Prompt Design Principles:
β Structured - Use clear sections to let the AI know the output format β Use emoji - Improve readability, suitable for the LINE interface β Clear language - Specify Traditional Chinese, avoid Simplified Chinese or English output β Focus on key points - Focus on the 4 aspects that users care about most
6. Integrating into the Bot Processing Flow
Add new Actions and processing functions in bot.go:
// Add PostBack Action
const (
ActionOpenDetail string = "DetailArticle"
ActionTransArticle string = "TransArticle"
ActionBookmarkArticle string = "BookmarkArticle"
ActionAnalyzePDF string = "AnalyzePDF" // β
Added
ActionHelp string = "Menu"
ActonShowFav string = "MyFavs"
ActionNewest string = "Newest"
ActionRandom string = "Random"
)
// actionAnalyzePDF: Analyze PDF from Public URL using Gemini
func actionAnalyzePDF(event *linebot.Event, values url.Values) {
arxivURL := values.Get("url")
log.Println("actionAnalyzePDF: url=", arxivURL)
// Step 1: Convert the arXiv URL to a PDF URL
pdfURL, err := ConvertToPDFURL(arxivURL)
if err != nil {
log.Println("Error converting to PDF URL:", err)
errString := fmt.Sprintf("β Failed to convert PDF URL: %s", err)
bot.ReplyMessage(event.ReplyToken, linebot.NewTextMessage(errString)).Do()
return
}
log.Println("Analyzing PDF:", pdfURL)
// Step 2: Reply with a processing message first (reduce user anxiety)
processingMsg := "π Analyzing PDF paper, please wait..."
bot.ReplyMessage(event.ReplyToken, linebot.NewTextMessage(processingMsg)).Do()
// Step 3: Use Gemini to analyze the PDF
analysisResult, err := GeminiPDF(pdfURL, PROMPT_PDFAnalysis)
if err != nil {
log.Println("Error analyzing PDF:", err)
errString := fmt.Sprintf(`β PDF analysis failed: %s
This may be because:
β’ PDF file is too large
β’ API quota is insufficient
β’ Network connection problem
Please try again later or use the "Abstract Translation" function.`, err)
bot.PushMessage(event.Source.UserID, linebot.NewTextMessage(errString)).Do()
return
}
// Step 4: Format and send the result
analysisResult = AddLineBreaksAroundURLs(analysisResult)
resultMsg := fmt.Sprintf("π **PDF Paper Analysis Result** \n\n%s\n\nπ Paper Link:\n%s",
analysisResult, arxivURL)
if _, err := bot.PushMessage(event.Source.UserID, linebot.NewTextMessage(resultMsg)).Do(); err != nil {
log.Println("Error sending analysis result:", err)
}
}
Flow Design Highlights:
- URL Conversion: Convert from abs URL to PDF URL
- Instant Feedback: Immediately reply with a "Processing" message
- Error Handling: Detailed error messages + alternative solution suggestions
- Result Formatting: Clear structure + original text link
7. UI Optimization - More Professional Interface
Update the button design of the Carousel Template:
// β
New version - professional and clear
columnList := []*linebot.CarouselColumn{}
for _, result := range records {
var saveTogle string
if exist, _ := InArray(result.ID, checkList); !exist {
saveTogle = "πΎ Save Article"
} else {
saveTogle = "ποΈ Remove Save"
}
detailData := fmt.Sprintf("action=%s&url=%s&user_id=%s",
ActionOpenDetail, result.ID, userId)
pdfData := fmt.Sprintf("action=%s&url=%s&user_id=%s",
ActionAnalyzePDF, result.ID, userId) // β
Added
SaveData := fmt.Sprintf("action=%s&url=%s&user_id=%s",
ActionBookmarkArticle, result.ID, userId)
tmpColumn := linebot.NewCarouselColumn(
Image_URL,
truncateString(result.Title, 35)+"..",
truncateString(result.Summary.Body, 55)+"..",
linebot.NewPostbackAction("π Detailed Information", detailData, "", "", "", ""),
linebot.NewPostbackAction("π AI Analyze PDF", pdfData, "", "", "", ""), // β
New feature
linebot.NewPostbackAction(saveTogle, SaveData, "", "", "", ""),
)
columnList = append(columnList, tmpColumn)
}
UI Improvement Comparison:
| Item | Old Version | New Version | Improvement |
|---|---|---|---|
| Detailed Information | βKnow Moreβ | βπ Detailed Informationβ | β Clearer |
| Summary Function | βTranslate Abstract (Longer)β | βπ AI Analyze PDFβ | β More professional |
| Save Button | βSave Articleβ | βπΎ Save Articleβ | β Visualization |
| Remove Button | βRemove Saveβ | βποΈ Remove Saveβ | β Intuitive |
Practical Application Examples
Example 1: Basic PDF Deep Analysis
User: [Search "transformer attention mechanism"]
Bot: [Displays paper list Carousel]
User: [Click "π AI Analyze PDF"]
Bot: π Analyzing PDF paper, please wait...
Bot: π PDF Paper Analysis Result
π Paper Overview
This paper proposes the Transformer architecture, a new neural network model entirely based on the attention mechanism.
The main purpose is to solve the limitations of RNN and LSTM when processing long sequences...
π¬ Research Methods
The paper uses the following technologies:
1. Multi-Head Attention Mechanism
2. Positional Encoding
3. Feedforward Neural Network Layer
...
π‘ Key Findings
- Achieved SOTA performance on machine translation tasks
- Training speed is 10 times faster than RNN
- Can be processed in parallel, improving efficiency
...
π― Application Value
The Transformer architecture later became the foundation of models such as BERT and GPT
Widely used in NLP, computer vision, speech recognition, and other fields
...
π Paper Link:
https://arxiv.org/abs/1706.03762
Example 2: Comparing Abstract Translation vs. PDF Analysis
Scenario: The same paper, using different functions
Using "Translate Abstract" (Old Function)
User: [Click "Translate Abstract (Longer)"]
Bot: This paper proposes a new attention mechanism model called Transformer.
The model is entirely based on the attention mechanism and does not use recurrent or convolutional structures.
It performs well on machine translation tasks and trains faster.
(About 50-100 words, only from the Abstract)
Using "AI Analyze PDF" (New Function)
User: [Click "π AI Analyze PDF"]
Bot: π PDF Paper Analysis Result
π Paper Overview
- Research background: Limitations of RNN and LSTM
- Research purpose: Design a parallelizable sequence model
- Core innovation: Architecture entirely based on attention
π¬ Research Methods
1. Encoder-Decoder architecture
- 6-layer encoder stack
- 6-layer decoder stack
2. Multi-Head Attention Mechanism
- Uses 8 attention heads
- Allows the model to focus on information from different positions
3. Positional Encoding
- Uses sin/cos functions
- Preserves sequence order information
π‘ Key Findings
1. Performance breakthrough
- WMT 2014 English-German translation: 28.4 BLEU
- WMT 2014 English-French translation: 41.8 BLEU
- Surpasses all previous models
2. Training efficiency
- Trained for 3.5 days using 8 GPUs
- More than 10 times faster than RNN models
3. Interpretability
- Attention weights can be visualized
- Understand how the model makes decisions
π― Application Value
1. Industry impact
- Adopted by Google Translate
- Became the standard architecture for NLP
2. Subsequent development
- BERT (2018)
- GPT series (2018-2024)
- Vision Transformer (2020)
3. Cross-domain applications
- Natural Language Processing
- Computer Vision
- Speech Recognition
- Bioinformatics
π Paper Link:
https://arxiv.org/abs/1706.03762
(About 300-500 words, covering the full paper content)
Comparison Summary:
| Feature | Translate Abstract | AI Analyze PDF | Difference |
|---|---|---|---|
| Content Source | Abstract only | Full PDF | β¬οΈ 10x deeper |
| Word Count | 50-100 words | 300-500 words | β¬οΈ 5x more detailed |
| Method Details | β None | β Detailed explanation | Significantly improved |
| Experimental Results | β Brief | β Complete data | Quantifiable |
| Chart Understanding | β Unable | β Supported | Qualitative leap |
| Conclusion Discussion | β None | β Included | More complete |
Example 3: Error Handling Demonstration
Scenario 1: PDF cannot be accessed
User: [Click "π AI Analyze PDF" of an invalid paper]
Bot: β Failed to convert PDF URL: URL does not contain a valid arXiv ID
Scenario 2: API error
User: [Click "π AI Analyze PDF"]
Bot: π Analyzing PDF paper, please wait...
Bot: β PDF analysis failed: API quota exceeded
This may be because:
β’ PDF file is too large
β’ API quota is insufficient
β’ Network connection problem
Please try again later or use the "Abstract Translation" function.
Technical Specifications
Gemini File API Structure
According to the official documentation, the request format for the File API:
{
"contents": [
{
"role": "user",
"parts": [
{
"fileData": {
"mimeType": "application/pdf",
"fileUri": "https://arxiv.org/pdf/1706.03762.pdf"
}
},
{
"text": "Please analyze this paper"
}
]
}
]
}
Go SDK Corresponding Writing
In the google-generative-ai-go SDK:
// β
Correct writing
parts := []genai.Part{
genai.FileData{
MIMEType: "application/pdf",
URI: "https://arxiv.org/pdf/1706.03762.pdf",
},
genai.Text("Please analyze this paper"),
}
response, err := model.GenerateContent(ctx, parts...)
Field Name Correspondence:
| JSON API | Go SDK | Description |
|---|---|---|
fileUri |
URI |
β οΈ Note the case difference |
mimeType |
MIMEType |
Standardized naming |
Supported File Formats and Limitations
According to the Google official announcement:
Supported Formats:
- β PDF: Maximum 1GB
- β Images: PNG, JPEG, WebP, GIF
- β Videos: MP4, MOV, AVI, MPEG
- β Audio: MP3, WAV, FLAC, AIFF
URL Restrictions:
- β Must be an HTTPS URL
- β Must be publicly accessible
- β Does not support URLs that require authentication
- β Does not support localhost or intranet IPs
- β Does not support URLs after redirect (in some cases)
For arXiv:
- β All PDFs are public URLs
- β Directly available, no additional processing required
- β File size is usually between 5-50MB, far less than the 1GB limit
Supported Models
| Model | Public URL Support | File Size Limit | Recommended Use |
|---|---|---|---|
| gemini-2.5-flash | β | 1GB | Recommended (fast and cheap) |
| gemini-2.5-pro | β | 1GB | Complex analysis |
| gemini-3.0-pro | β | 1GB | Highest quality |
| gemini-2.0-flash-lite | β | - | Does not support File API |
Environment Setup
Required Environment Variables
# Gemini API Key (required)
export GOOGLE_GEMINI_API_KEY="your-api-key-here"
# LINE Bot Settings
export ChannelSecret="your-channel-secret"
export ChannelAccessToken="your-channel-access-token"
# Optional: GitHub Issue as a database
export GITHUB_URL="your-github-repo-url"
Settings No Longer Needed
Due to the change to Public URL, the following settings are no longer needed:
# β Not needed anymore GCS
# GOOGLE_CLOUD_PROJECT=...
# GOOGLE_APPLICATION_CREDENTIALS=...
# GCS_BUCKET_NAME=...
# β Not needed anymore local storage
# PDF_STORAGE_PATH=...
# MAX_STORAGE_SIZE=...
This greatly simplifies the deployment process!
Local Testing
# 1. Clone the project
git clone https://github.com/kkdai/linebot-arxiv.git
cd linebot-arxiv
# 2. Set environment variables
export GOOGLE_GEMINI_API_KEY="your-api-key"
export ChannelSecret="your-channel-secret"
export ChannelAccessToken="your-channel-access-token"
# 3. Compile
go build
# 4. Execute
./linebot-arxiv
Performance and Cost Analysis
Processing Depth Comparison
| Metric | Translate Abstract | AI Analyze PDF | Improvement |
|---|---|---|---|
| Content Source | Abstract | Full PDF | β¬οΈ 10x |
| Word Count | 50-100 | 300-500 | β¬οΈ 5x |
| Method Details | β | β Detailed | Qualitative leap |
| Experimental Data | β | β Complete | Quantifiable |
| Chart Analysis | β | β Supported | New capability |
| Conclusion Discussion | β | β Included | More complete |
Speed Comparison
| Stage | Translate Abstract | AI Analyze PDF | Description |
|---|---|---|---|
| API Call | 1 time | 1 time | Same |
| Processing Time | 3-5 seconds | 15-30 seconds | More PDF content |
| User Waiting | Short | Medium | Has processing prompt |
Optimization Strategies:
- β Provide "Processing" prompt to reduce anxiety
- β Use PushMessage for asynchronous replies
- β Allow users to choose fast or deep analysis
Cost Comparison: GCS vs. Public URL
Plan A: Using GCS (Old Method)
Assuming 1000 PDF analyses per month, averaging 5MB/paper:
GCS storage cost:
- Assuming storage for 7 days
- Average storage: 1000 Γ 5MB Γ 7/30 = 1.17GB
- Monthly fee: 1.17GB Γ $0.020 = $0.023
GCS outbound traffic:
- Each Gemini read: 1000 Γ 5MB = 5GB
- Monthly fee: 5GB Γ $0.12 = $0.60
Maintenance cost:
- Development of upload/delete logic: 8 hours
- Monitoring storage space: 2 hours per month
- Handling errors: 1 hour per month
- Engineer hourly rate $50: $550
Total for the first month: $0.623 + $550 = $550.62
Total for subsequent months: $0.623 + $150 = $150.62
Plan B: Using Public URL (New Method)
Additional cost: $0.00
Maintenance cost: $0.00
Development time: 2 hours (only URL conversion logic)
Total for the first month: $0.00 + $100 = $100
Total for subsequent months: $0.00
Save in the first month: $450** Save each month: $150+**
Gemini API Cost
(The cost is the same as using GCS, because it's all calling the Gemini API)
Using gemini-2.5-flash:
- Input: $0.075 / 1M tokens
- Output: $0.30 / 1M tokens
Assuming each PDF analysis:
- Input tokens: ~50,000 (PDF content + prompt)
- Output tokens: ~1,500 (analysis result)
Cost per time:
- Input: 50,000 Γ $0.075 / 1M = $0.00375
- Output: 1,500 Γ $0.30 / 1M = $0.00045
- Total: $0.0042
Monthly cost (1000 times):
- 1000 Γ $0.0042 = $4.20
Total Cost Comparison:
| Item | GCS Plan | Public URL Plan | Savings |
|---|---|---|---|
| Gemini API | $4.20 | $4.20 | $0 |
| GCS Fees | $0.62 | $0.00 | $0.62 |
| Development & Maintenance | $150+ | $0 | $150+ |
| Total | $154.82 | $4.20 | $150.62 (97%) |
Development Experience
1. Google's New Feature Completely Changes the Game
When I saw Google's announcement supporting Public URLs, my first reaction was: "How much trouble will this save!"
Pain points in the past:
// β What needed to be done in the past
1. Download PDF
resp, _ := http.Get(pdfURL)
data, _ := ioutil.ReadAll(resp.Body)
2. Store it locally
ioutil.WriteFile("/tmp/paper.pdf", data, 0644)
3. Upload to GCS
bucket.Upload(context, "papers/xxx.pdf", data)
4. Generate GCS URL
gsURL := "gs://bucket/papers/xxx.pdf"
5. Use Gemini to analyze
file := genai.upload_file(gsURL)
response := model.generate_content([file, prompt])
6. Remember to clean up
Regularly delete old files, otherwise GCS fees will accumulate
Now only need:
// β
Now only need one line
response := GeminiPDF(pdfURL, prompt)
From 6 steps to 1 step, from needing to manage storage to not needing to manage at all, the value brought by this change far exceeds the technology itself.
2. Differences between SDK Documentation and Actual Implementation
The FileURI vs URI issue I encountered this time taught me: Never just look at the API documentation, but check the actual definition of the SDK.
// My incorrect reasoning process:
// 1. Look at the JSON API documentation: { "fileUri": "..." }
// 2. Guess the Go struct: FileURI string
// 3. Compilation failed!
// Correct approach:
// 1. Look at the JSON API documentation
// 2. View the SDK source code or examples
// 3. Confirm the actual field name
Lesson:
- β
JSON API's
fileUriβ Go SDK'sURI - β Naming conventions do not necessarily correspond exactly
- β Look at the official examples first, then start writing
This small problem took me 10 minutes to debug, but the experience I learned is very valuable.
3. Thinking from "Function Implementation" to "User Experience"
When I first implemented it, I only considered the function: "It's good to be able to analyze PDFs". But when I actually used it, I found:
Problem: Poor user experience
User: [Click the button]
... (Wait 20 seconds) ...
User: "Is it broken?" (Start to worry)
Bot: [Finally replies]
Improvement: Add instant feedback
User: [Click the button]
Bot: π Analyzing PDF paper, please wait...
User: "Oh, it's processing" (Relieved)
... (Wait) ...
Bot: π PDF Paper Analysis Result...
This small change greatly improved the user experience. The key is not how strong the function is, but how the user feels.
4. Limiting the Scope is a Kind of Wisdom
Initially, I thought: "Since Public URL is supported, why not support all PDFs?"
But after careful consideration, I decided to only support arXiv:
Technical reasons:
- β arXiv's URL format is standardized and easy to verify
- β arXiv's PDFs are all academic papers, and the content is predictable
- β Avoid being misused to analyze arbitrary PDFs
Cost reasons:
- β Limiting the scope of use = predictable API costs
- β Avoid a large number of non-academic PDFs consuming the quota
Product positioning:
- β Focus on academic paper assistants, not general PDF analyzers
- β Clear value proposition
Lesson: Not all possibilities need to be implemented, and limitations are part of the design.




Top comments (0)