DEV Community

Evan Lin for Google Developer Experts

Posted on • Originally published at evanlin.com on

Gemini 2.5 Flash: Analyzing PDF Papers Directly from Public URLs with the Gemini File API

image-20260114214635026

Background

While maintaining the linebot-arxiv project, I've always wanted to provide users with more in-depth paper analysis features. The existing "Translate Abstract" function can only process the Abstract section of a paper, but users often need to understand more: research methods, experimental results, chart analysis, and other complete content.

In the past, to achieve this function, you had to:

  1. Download the PDF to the local server
  2. Upload to Google Cloud Storage (GCS)
  3. Use the Gemini API to analyze files on GCS

This process is not only complex (requires managing storage space), but also involves additional costs (GCS storage fees + traffic fees), which is too heavy for a simple LINE Bot.

But in January 2025, Google announced a major update in the official blog: Gemini API now supports reading files directly from Public URLs! This means we can skip GCS and directly let Gemini analyze PDF papers on arXiv.

This change brings not only technical simplification, but also huge advantages in terms of cost and maintenance.

Screen Display

Old Interface

image-20260114214724380

  • "Know More" - Displays detailed paper information
  • "Translate Abstract (Longer)" - Only translates the Abstract
  • "Save Article" - Add to Favorites

New Interface

image-20260114214745189

  • "πŸ“‹ Detailed Information" - Clearer labels
  • "πŸ“‘ AI Analyze PDF" - New Feature! Deep analysis of the full PDF
  • "πŸ’Ύ Save Article" - Visual improvements

Analysis Result Display

image-20260114214833728

Complete structured analysis, including paper overview, research methods, key findings, and application value.

Main Repo https://github.com/kkdai/linebot-arxiv

Problems Encountered During Development

Problem 1: Limitations of the Old Implementation

Before implementing the paper analysis function, the existing approach was as follows:

// ❌ Old version - can only process Abstract text
func actionGPTTranslate(event *linebot.Event, values url.Values) {
    url := values.Get("url")
    result := getArticleByURL(url)

    // Can only get the text content of the Abstract
    sumResult, err := GeminiChat(
        fmt.Sprintf(PROMPT_Summarization, result[0].Summary.Body)
    )

    // Return the translation result...
}

Enter fullscreen mode Exit fullscreen mode

This method has obvious limitations:

❌ Shallow Content - Can only analyze the Abstract, unable to delve into the main body of the paper

User: "What experimental methods did this paper use?"
Bot: [Cannot answer, because the Abstract usually does not detail the method specifics]

Enter fullscreen mode Exit fullscreen mode

❌ Unable to understand charts - The Abstract is plain text, unable to analyze the charts and data in the paper

❌ Lack of complete context - Unable to understand the experimental design, result analysis, and discussion content

Problem 2: Past Solutions Were Too Heavy

To have Gemini analyze the full PDF, the only method in the past was:

# ❌ Old method - requires GCS
1. Download PDF to local
   pdf_data = requests.get(pdf_url).content

2. Upload to GCS
   blob = bucket.blob(f"papers/{paper_id}.pdf")
   blob.upload_from_string(pdf_data)

3. Use Gemini to analyze files on GCS
   file = genai.upload_file(f"gs://bucket/papers/{paper_id}.pdf")
   response = model.generate_content([file, prompt])

Enter fullscreen mode Exit fullscreen mode

The problems with this method:

πŸ’° Increased Costs

  • GCS storage fees: $0.020 per GB/month
  • GCS outbound traffic: $0.12 per GB
  • Need to manage file lifecycle (when to delete?)

πŸ”§ High Complexity

  • Need to set up a GCS bucket
  • Need to manage permissions (Service Account)
  • Need to handle upload failures, clean up old files, and other logic

⏱️ Increased Delay

  • Download PDF: 1-3 seconds
  • Upload to GCS: 1-2 seconds
  • Gemini analysis: 10-20 seconds
  • Total: 12-25 seconds

For a LINE Bot, such an architecture is really too heavy!

Problem 3: Confusion in Using the New API

When I saw the announcement on the Google official blog, I immediately wanted to integrate this function. But when I looked at the documentation, I found some confusion:

JSON format of the official documentation:

{
  "parts": [
    {
      "fileData": {
        "mimeType": "application/pdf",
        "fileUri": "https://arxiv.org/pdf/2301.12345.pdf"
      }
    }
  ]
}

Enter fullscreen mode Exit fullscreen mode

But how to write it in the Go SDK?

First attempt (based on JSON speculation):

// ❌ Compilation error
parts := []genai.Part{
    genai.FileData{
        MIMEType: "application/pdf",
        FileURI: pdfURL, // Error!
    },
}

Enter fullscreen mode Exit fullscreen mode

The result is an error:

./gemini.go:80:4: unknown field FileURI in struct literal of type genai.FileData

Enter fullscreen mode Exit fullscreen mode

It turns out that the field name in the Go SDK is not FileURI, but URI! This is a small difference between the SDK design and the API documentation.

Correct Solution

1. Google's New Feature: File API with Public URLs

According to the Google official announcement, Gemini 2.5 Flash and newer models now support:

Feature Old Version (GCS only) New Version (Public URL)
File Source Must upload to GCS βœ… Supports Public URL
File Size Maximum 20MB βœ… Maximum 1GB
File Type PDF, Image, Video βœ… Same support
Additional Cost GCS storage + traffic βœ… No additional cost
Configuration Complexity Requires GCS bucket βœ… No additional configuration required
Processing Delay Download + Upload + Analysis βœ… Direct analysis

Supported formats:

  • βœ… PDF (maximum 1GB)
  • βœ… Images (PNG, JPEG, WebP)
  • βœ… Videos (MP4, MOV, AVI)
  • βœ… Audio (MP3, WAV, FLAC)

Restrictions:

  • ⚠️ Must be a publicly accessible URL
  • ⚠️ Does not support URLs that require authentication
  • ⚠️ Does not support localhost or intranet URLs

For arXiv papers, this feature is simply perfect! All paper PDFs are public URLs, without any additional processing.

2. Cost Comparison Analysis

Let's calculate the actual cost difference:

Old Method (Using GCS)

Assuming 1000 papers are analyzed per month, and each PDF is an average of 5MB:

Storage cost:
- Total storage: 1000 papers Γ— 5MB = 5GB
- Monthly fee: 5GB Γ— $0.020 = $0.10

Traffic cost:
- Gemini reading: 1000 papers Γ— 5MB = 5GB
- Monthly fee: 5GB Γ— $0.12 = $0.60

Management cost:
- Development time for automatic cleanup scripts
- Monitoring storage space
- Handling upload failure logic

Total: $0.70 + development and maintenance costs

Enter fullscreen mode Exit fullscreen mode

New Method (Using Public URL)

Additional cost: $0.00
Management cost: $0.00

Total: $0.00

Enter fullscreen mode Exit fullscreen mode

Save 100% of the additional cost! And there is no need to manage GCS at all.

3. Implementing a PDF URL Conversion Tool

First, I added a URL processing function in arxiv.go:

// ConvertToPDFURL converts the arXiv abs URL to a PDF URL
func ConvertToPDFURL(absURL string) (string, error) {
    // First normalize the URL to ensure it is a valid arXiv URL
    normalizedURL, err := NormalizeArxivURL(absURL)
    if err != nil {
        return "", err
    }

    // Extract the paper ID from the normalized URL
    paperID := getIDfromURL(normalizedURL)
    if paperID == "" {
        return "", errors.New("failed to extract paper ID from URL")
    }

    // Return the PDF URL
    return "https://arxiv.org/pdf/" + paperID + ".pdf", nil
}

// IsArxivPDFURL checks if it is a valid arXiv PDF URL
func IsArxivPDFURL(link string) bool {
    parsedURL, err := url.Parse(link)
    if err != nil {
        return false
    }

    // Check if it is arxiv.org and the path contains /pdf/
    if parsedURL.Host != "arxiv.org" {
        return false
    }

    // Check if the path matches the /pdf/xxxx.xxxx.pdf format
    re := regexp.MustCompile(`/pdf/\d{4}\.\d{4,5}(v\d+)?\.pdf`)
    return re.MatchString(parsedURL.Path)
}

Enter fullscreen mode Exit fullscreen mode

Why restrict support to arXiv only?

Security considerations:

  • βœ… arXiv URLs are all trusted academic sources
  • βœ… Avoid being misused to analyze arbitrary PDFs
  • βœ… Focus on the core functions of the project

Cost control:

  • βœ… Limit the scope of use
  • βœ… Predictable API usage

4. Implementing the Gemini PDF Analysis Function

Add a new function in gemini.go:

// GeminiPDF: Analyze PDF from Public URL and return the result
func GeminiPDF(pdfURL string, prompt string) (string, error) {
    ctx := context.Background()
    client, err := genai.NewClient(ctx, option.WithAPIKey(geminiKey))
    if err != nil {
        log.Println("Failed to create Gemini client:", err)
        return "", err
    }
    defer client.Close()

    // Use the gemini-2.5-flash model
    model := client.GenerativeModel("gemini-2.5-flash")
    value := float32(ChatTemperture)
    model.Temperature = &value

    // βœ… Key: Use the FileData structure to handle Public URLs
    parts := []genai.Part{
        genai.FileData{
            MIMEType: "application/pdf",
            URI: pdfURL, // Note: It's URI not FileURI!
        },
        genai.Text(prompt),
    }

    log.Println("Begin processing PDF from URL:", pdfURL)
    resp, err := model.GenerateContent(ctx, parts...)
    if err != nil {
        log.Println("Failed to generate content from PDF:", err)
        return "", err
    }
    log.Println("Finished processing PDF...")

    return printResponse(resp), nil
}

Enter fullscreen mode Exit fullscreen mode

Implementation Highlights:

  1. Correct field name
// ❌ Error
FileURI: pdfURL

// βœ… Correct
URI: pdfURL

Enter fullscreen mode Exit fullscreen mode
  1. MIMEType setting
MIMEType: "application/pdf" // Clearly tell Gemini this is a PDF

Enter fullscreen mode Exit fullscreen mode
  1. Temperature parameter
value := float32(ChatTemperture) // 0.2 to ensure stable analysis results
model.Temperature = &value

Enter fullscreen mode Exit fullscreen mode

5. Designing Professional Analysis Prompts

In order to make the analysis results more professional and structured, I designed this Prompt:

const PROMPT_PDFAnalysis = `Please analyze this arXiv paper in Traditional Chinese, including:

πŸ“Œ **Paper Overview**
- Research topic and purpose

πŸ”¬ **Research Methods**
- Used technologies and methods

πŸ’‘ **Key Findings**
- Key results and contributions

🎯 **Application Value**
- Practical applications and impact

Please present it in a clear and professional manner, and respond in Traditional Chinese.`

Enter fullscreen mode Exit fullscreen mode

Prompt Design Principles:

βœ… Structured - Use clear sections to let the AI know the output format βœ… Use emoji - Improve readability, suitable for the LINE interface βœ… Clear language - Specify Traditional Chinese, avoid Simplified Chinese or English output βœ… Focus on key points - Focus on the 4 aspects that users care about most

6. Integrating into the Bot Processing Flow

Add new Actions and processing functions in bot.go:

// Add PostBack Action
const (
    ActionOpenDetail string = "DetailArticle"
    ActionTransArticle string = "TransArticle"
    ActionBookmarkArticle string = "BookmarkArticle"
    ActionAnalyzePDF string = "AnalyzePDF" // βœ… Added
    ActionHelp string = "Menu"
    ActonShowFav string = "MyFavs"
    ActionNewest string = "Newest"
    ActionRandom string = "Random"
)

// actionAnalyzePDF: Analyze PDF from Public URL using Gemini
func actionAnalyzePDF(event *linebot.Event, values url.Values) {
    arxivURL := values.Get("url")
    log.Println("actionAnalyzePDF: url=", arxivURL)

    // Step 1: Convert the arXiv URL to a PDF URL
    pdfURL, err := ConvertToPDFURL(arxivURL)
    if err != nil {
        log.Println("Error converting to PDF URL:", err)
        errString := fmt.Sprintf("❌ Failed to convert PDF URL: %s", err)
        bot.ReplyMessage(event.ReplyToken, linebot.NewTextMessage(errString)).Do()
        return
    }

    log.Println("Analyzing PDF:", pdfURL)

    // Step 2: Reply with a processing message first (reduce user anxiety)
    processingMsg := "πŸ” Analyzing PDF paper, please wait..."
    bot.ReplyMessage(event.ReplyToken, linebot.NewTextMessage(processingMsg)).Do()

    // Step 3: Use Gemini to analyze the PDF
    analysisResult, err := GeminiPDF(pdfURL, PROMPT_PDFAnalysis)
    if err != nil {
        log.Println("Error analyzing PDF:", err)
        errString := fmt.Sprintf(`❌ PDF analysis failed: %s

This may be because:
β€’ PDF file is too large
β€’ API quota is insufficient
β€’ Network connection problem

Please try again later or use the "Abstract Translation" function.`, err)
        bot.PushMessage(event.Source.UserID, linebot.NewTextMessage(errString)).Do()
        return
    }

    // Step 4: Format and send the result
    analysisResult = AddLineBreaksAroundURLs(analysisResult)
    resultMsg := fmt.Sprintf("πŸ“„ **PDF Paper Analysis Result** \n\n%s\n\nπŸ“Ž Paper Link:\n%s",
        analysisResult, arxivURL)

    if _, err := bot.PushMessage(event.Source.UserID, linebot.NewTextMessage(resultMsg)).Do(); err != nil {
        log.Println("Error sending analysis result:", err)
    }
}

Enter fullscreen mode Exit fullscreen mode

Flow Design Highlights:

  1. URL Conversion: Convert from abs URL to PDF URL
  2. Instant Feedback: Immediately reply with a "Processing" message
  3. Error Handling: Detailed error messages + alternative solution suggestions
  4. Result Formatting: Clear structure + original text link

7. UI Optimization - More Professional Interface

Update the button design of the Carousel Template:

// βœ… New version - professional and clear
columnList := []*linebot.CarouselColumn{}
for _, result := range records {
    var saveTogle string
    if exist, _ := InArray(result.ID, checkList); !exist {
        saveTogle = "πŸ’Ύ Save Article"
    } else {
        saveTogle = "πŸ—‘οΈ Remove Save"
    }

    detailData := fmt.Sprintf("action=%s&url=%s&user_id=%s",
        ActionOpenDetail, result.ID, userId)
    pdfData := fmt.Sprintf("action=%s&url=%s&user_id=%s",
        ActionAnalyzePDF, result.ID, userId) // βœ… Added
    SaveData := fmt.Sprintf("action=%s&url=%s&user_id=%s",
        ActionBookmarkArticle, result.ID, userId)

    tmpColumn := linebot.NewCarouselColumn(
        Image_URL,
        truncateString(result.Title, 35)+"..",
        truncateString(result.Summary.Body, 55)+"..",
        linebot.NewPostbackAction("πŸ“‹ Detailed Information", detailData, "", "", "", ""),
        linebot.NewPostbackAction("πŸ“‘ AI Analyze PDF", pdfData, "", "", "", ""), // βœ… New feature
        linebot.NewPostbackAction(saveTogle, SaveData, "", "", "", ""),
    )
    columnList = append(columnList, tmpColumn)
}

Enter fullscreen mode Exit fullscreen mode

UI Improvement Comparison:

Item Old Version New Version Improvement
Detailed Information β€œKnow More” β€œπŸ“‹ Detailed Information” βœ… Clearer
Summary Function β€œTranslate Abstract (Longer)” β€œπŸ“‘ AI Analyze PDF” βœ… More professional
Save Button β€œSave Article” β€œπŸ’Ύ Save Article” βœ… Visualization
Remove Button β€œRemove Save” β€œπŸ—‘οΈ Remove Save” βœ… Intuitive

Practical Application Examples

Example 1: Basic PDF Deep Analysis

User: [Search "transformer attention mechanism"]
Bot: [Displays paper list Carousel]

User: [Click "πŸ“‘ AI Analyze PDF"]
Bot: πŸ” Analyzing PDF paper, please wait...

Bot: πŸ“„ PDF Paper Analysis Result

πŸ“Œ Paper Overview
This paper proposes the Transformer architecture, a new neural network model entirely based on the attention mechanism.
The main purpose is to solve the limitations of RNN and LSTM when processing long sequences...

πŸ”¬ Research Methods
The paper uses the following technologies:
1. Multi-Head Attention Mechanism
2. Positional Encoding
3. Feedforward Neural Network Layer
...

πŸ’‘ Key Findings
- Achieved SOTA performance on machine translation tasks
- Training speed is 10 times faster than RNN
- Can be processed in parallel, improving efficiency
...

🎯 Application Value
The Transformer architecture later became the foundation of models such as BERT and GPT
Widely used in NLP, computer vision, speech recognition, and other fields
...

πŸ“Ž Paper Link:
https://arxiv.org/abs/1706.03762

Enter fullscreen mode Exit fullscreen mode

Example 2: Comparing Abstract Translation vs. PDF Analysis

Scenario: The same paper, using different functions

Using "Translate Abstract" (Old Function)

User: [Click "Translate Abstract (Longer)"]
Bot: This paper proposes a new attention mechanism model called Transformer.
The model is entirely based on the attention mechanism and does not use recurrent or convolutional structures.
It performs well on machine translation tasks and trains faster.

(About 50-100 words, only from the Abstract)

Enter fullscreen mode Exit fullscreen mode

Using "AI Analyze PDF" (New Function)

User: [Click "πŸ“‘ AI Analyze PDF"]
Bot: πŸ“„ PDF Paper Analysis Result

πŸ“Œ Paper Overview
- Research background: Limitations of RNN and LSTM
- Research purpose: Design a parallelizable sequence model
- Core innovation: Architecture entirely based on attention

πŸ”¬ Research Methods
1. Encoder-Decoder architecture
   - 6-layer encoder stack
   - 6-layer decoder stack

2. Multi-Head Attention Mechanism
   - Uses 8 attention heads
   - Allows the model to focus on information from different positions

3. Positional Encoding
   - Uses sin/cos functions
   - Preserves sequence order information

πŸ’‘ Key Findings
1. Performance breakthrough
   - WMT 2014 English-German translation: 28.4 BLEU
   - WMT 2014 English-French translation: 41.8 BLEU
   - Surpasses all previous models

2. Training efficiency
   - Trained for 3.5 days using 8 GPUs
   - More than 10 times faster than RNN models

3. Interpretability
   - Attention weights can be visualized
   - Understand how the model makes decisions

🎯 Application Value
1. Industry impact
   - Adopted by Google Translate
   - Became the standard architecture for NLP

2. Subsequent development
   - BERT (2018)
   - GPT series (2018-2024)
   - Vision Transformer (2020)

3. Cross-domain applications
   - Natural Language Processing
   - Computer Vision
   - Speech Recognition
   - Bioinformatics

πŸ“Ž Paper Link:
https://arxiv.org/abs/1706.03762

(About 300-500 words, covering the full paper content)

Enter fullscreen mode Exit fullscreen mode

Comparison Summary:

Feature Translate Abstract AI Analyze PDF Difference
Content Source Abstract only Full PDF ⬆️ 10x deeper
Word Count 50-100 words 300-500 words ⬆️ 5x more detailed
Method Details ❌ None βœ… Detailed explanation Significantly improved
Experimental Results ❌ Brief βœ… Complete data Quantifiable
Chart Understanding ❌ Unable βœ… Supported Qualitative leap
Conclusion Discussion ❌ None βœ… Included More complete

Example 3: Error Handling Demonstration

Scenario 1: PDF cannot be accessed
User: [Click "πŸ“‘ AI Analyze PDF" of an invalid paper]
Bot: ❌ Failed to convert PDF URL: URL does not contain a valid arXiv ID

Scenario 2: API error
User: [Click "πŸ“‘ AI Analyze PDF"]
Bot: πŸ” Analyzing PDF paper, please wait...
Bot: ❌ PDF analysis failed: API quota exceeded

This may be because:
β€’ PDF file is too large
β€’ API quota is insufficient
β€’ Network connection problem

Please try again later or use the "Abstract Translation" function.

Enter fullscreen mode Exit fullscreen mode

Technical Specifications

Gemini File API Structure

According to the official documentation, the request format for the File API:

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "fileData": {
            "mimeType": "application/pdf",
            "fileUri": "https://arxiv.org/pdf/1706.03762.pdf"
          }
        },
        {
          "text": "Please analyze this paper"
        }
      ]
    }
  ]
}

Enter fullscreen mode Exit fullscreen mode

Go SDK Corresponding Writing

In the google-generative-ai-go SDK:

// βœ… Correct writing
parts := []genai.Part{
    genai.FileData{
        MIMEType: "application/pdf",
        URI: "https://arxiv.org/pdf/1706.03762.pdf",
    },
    genai.Text("Please analyze this paper"),
}

response, err := model.GenerateContent(ctx, parts...)

Enter fullscreen mode Exit fullscreen mode

Field Name Correspondence:

JSON API Go SDK Description
fileUri URI ⚠️ Note the case difference
mimeType MIMEType Standardized naming

Supported File Formats and Limitations

According to the Google official announcement:

Supported Formats:

  • βœ… PDF: Maximum 1GB
  • βœ… Images: PNG, JPEG, WebP, GIF
  • βœ… Videos: MP4, MOV, AVI, MPEG
  • βœ… Audio: MP3, WAV, FLAC, AIFF

URL Restrictions:

  • βœ… Must be an HTTPS URL
  • βœ… Must be publicly accessible
  • ❌ Does not support URLs that require authentication
  • ❌ Does not support localhost or intranet IPs
  • ❌ Does not support URLs after redirect (in some cases)

For arXiv:

  • βœ… All PDFs are public URLs
  • βœ… Directly available, no additional processing required
  • βœ… File size is usually between 5-50MB, far less than the 1GB limit

Supported Models

Model Public URL Support File Size Limit Recommended Use
gemini-2.5-flash βœ… 1GB Recommended (fast and cheap)
gemini-2.5-pro βœ… 1GB Complex analysis
gemini-3.0-pro βœ… 1GB Highest quality
gemini-2.0-flash-lite ❌ - Does not support File API

Environment Setup

Required Environment Variables

# Gemini API Key (required)
export GOOGLE_GEMINI_API_KEY="your-api-key-here"

# LINE Bot Settings
export ChannelSecret="your-channel-secret"
export ChannelAccessToken="your-channel-access-token"

# Optional: GitHub Issue as a database
export GITHUB_URL="your-github-repo-url"

Enter fullscreen mode Exit fullscreen mode

Settings No Longer Needed

Due to the change to Public URL, the following settings are no longer needed:

# ❌ Not needed anymore GCS
# GOOGLE_CLOUD_PROJECT=...
# GOOGLE_APPLICATION_CREDENTIALS=...
# GCS_BUCKET_NAME=...

# ❌ Not needed anymore local storage
# PDF_STORAGE_PATH=...
# MAX_STORAGE_SIZE=...

Enter fullscreen mode Exit fullscreen mode

This greatly simplifies the deployment process!

Local Testing

# 1. Clone the project
git clone https://github.com/kkdai/linebot-arxiv.git
cd linebot-arxiv

# 2. Set environment variables
export GOOGLE_GEMINI_API_KEY="your-api-key"
export ChannelSecret="your-channel-secret"
export ChannelAccessToken="your-channel-access-token"

# 3. Compile
go build

# 4. Execute
./linebot-arxiv

Enter fullscreen mode Exit fullscreen mode

Performance and Cost Analysis

Processing Depth Comparison

Metric Translate Abstract AI Analyze PDF Improvement
Content Source Abstract Full PDF ⬆️ 10x
Word Count 50-100 300-500 ⬆️ 5x
Method Details ❌ βœ… Detailed Qualitative leap
Experimental Data ❌ βœ… Complete Quantifiable
Chart Analysis ❌ βœ… Supported New capability
Conclusion Discussion ❌ βœ… Included More complete

Speed Comparison

Stage Translate Abstract AI Analyze PDF Description
API Call 1 time 1 time Same
Processing Time 3-5 seconds 15-30 seconds More PDF content
User Waiting Short Medium Has processing prompt

Optimization Strategies:

  • βœ… Provide "Processing" prompt to reduce anxiety
  • βœ… Use PushMessage for asynchronous replies
  • βœ… Allow users to choose fast or deep analysis

Cost Comparison: GCS vs. Public URL

Plan A: Using GCS (Old Method)

Assuming 1000 PDF analyses per month, averaging 5MB/paper:

GCS storage cost:
- Assuming storage for 7 days
- Average storage: 1000 Γ— 5MB Γ— 7/30 = 1.17GB
- Monthly fee: 1.17GB Γ— $0.020 = $0.023

GCS outbound traffic:
- Each Gemini read: 1000 Γ— 5MB = 5GB
- Monthly fee: 5GB Γ— $0.12 = $0.60

Maintenance cost:
- Development of upload/delete logic: 8 hours
- Monitoring storage space: 2 hours per month
- Handling errors: 1 hour per month
- Engineer hourly rate $50: $550

Total for the first month: $0.623 + $550 = $550.62
Total for subsequent months: $0.623 + $150 = $150.62

Enter fullscreen mode Exit fullscreen mode

Plan B: Using Public URL (New Method)

Additional cost: $0.00
Maintenance cost: $0.00
Development time: 2 hours (only URL conversion logic)

Total for the first month: $0.00 + $100 = $100
Total for subsequent months: $0.00

Enter fullscreen mode Exit fullscreen mode

Save in the first month: $450** Save each month: $150+**

Gemini API Cost

(The cost is the same as using GCS, because it's all calling the Gemini API)

Using gemini-2.5-flash:

  • Input: $0.075 / 1M tokens
  • Output: $0.30 / 1M tokens

Assuming each PDF analysis:

  • Input tokens: ~50,000 (PDF content + prompt)
  • Output tokens: ~1,500 (analysis result)
Cost per time:
- Input: 50,000 Γ— $0.075 / 1M = $0.00375
- Output: 1,500 Γ— $0.30 / 1M = $0.00045
- Total: $0.0042

Monthly cost (1000 times):
- 1000 Γ— $0.0042 = $4.20

Enter fullscreen mode Exit fullscreen mode

Total Cost Comparison:

Item GCS Plan Public URL Plan Savings
Gemini API $4.20 $4.20 $0
GCS Fees $0.62 $0.00 $0.62
Development & Maintenance $150+ $0 $150+
Total $154.82 $4.20 $150.62 (97%)

Development Experience

1. Google's New Feature Completely Changes the Game

When I saw Google's announcement supporting Public URLs, my first reaction was: "How much trouble will this save!"

Pain points in the past:

// ❌ What needed to be done in the past
1. Download PDF
   resp, _ := http.Get(pdfURL)
   data, _ := ioutil.ReadAll(resp.Body)

2. Store it locally
   ioutil.WriteFile("/tmp/paper.pdf", data, 0644)

3. Upload to GCS
   bucket.Upload(context, "papers/xxx.pdf", data)

4. Generate GCS URL
   gsURL := "gs://bucket/papers/xxx.pdf"

5. Use Gemini to analyze
   file := genai.upload_file(gsURL)
   response := model.generate_content([file, prompt])

6. Remember to clean up
   Regularly delete old files, otherwise GCS fees will accumulate

Enter fullscreen mode Exit fullscreen mode

Now only need:

// βœ… Now only need one line
response := GeminiPDF(pdfURL, prompt)

Enter fullscreen mode Exit fullscreen mode

From 6 steps to 1 step, from needing to manage storage to not needing to manage at all, the value brought by this change far exceeds the technology itself.

2. Differences between SDK Documentation and Actual Implementation

The FileURI vs URI issue I encountered this time taught me: Never just look at the API documentation, but check the actual definition of the SDK.

// My incorrect reasoning process:
// 1. Look at the JSON API documentation: { "fileUri": "..." }
// 2. Guess the Go struct: FileURI string
// 3. Compilation failed!

// Correct approach:
// 1. Look at the JSON API documentation
// 2. View the SDK source code or examples
// 3. Confirm the actual field name

Enter fullscreen mode Exit fullscreen mode

Lesson:

  • βœ… JSON API's fileUri β†’ Go SDK's URI
  • βœ… Naming conventions do not necessarily correspond exactly
  • βœ… Look at the official examples first, then start writing

This small problem took me 10 minutes to debug, but the experience I learned is very valuable.

3. Thinking from "Function Implementation" to "User Experience"

When I first implemented it, I only considered the function: "It's good to be able to analyze PDFs". But when I actually used it, I found:

Problem: Poor user experience

User: [Click the button]
... (Wait 20 seconds) ...
User: "Is it broken?" (Start to worry)
Bot: [Finally replies]

Enter fullscreen mode Exit fullscreen mode

Improvement: Add instant feedback

User: [Click the button]
Bot: πŸ” Analyzing PDF paper, please wait...
User: "Oh, it's processing" (Relieved)
... (Wait) ...
Bot: πŸ“„ PDF Paper Analysis Result...

Enter fullscreen mode Exit fullscreen mode

This small change greatly improved the user experience. The key is not how strong the function is, but how the user feels.

4. Limiting the Scope is a Kind of Wisdom

Initially, I thought: "Since Public URL is supported, why not support all PDFs?"

But after careful consideration, I decided to only support arXiv:

Technical reasons:

  • βœ… arXiv's URL format is standardized and easy to verify
  • βœ… arXiv's PDFs are all academic papers, and the content is predictable
  • βœ… Avoid being misused to analyze arbitrary PDFs

Cost reasons:

  • βœ… Limiting the scope of use = predictable API costs
  • βœ… Avoid a large number of non-academic PDFs consuming the quota

Product positioning:

  • βœ… Focus on academic paper assistants, not general PDF analyzers
  • βœ… Clear value proposition

Lesson: Not all possibilities need to be implemented, and limitations are part of the design.

Top comments (0)