Bulk Summarizing Notes with DeepInfra Llama-3.1-70B
Why DeepInfra for Bulk Summarization?
Bulk summarization is a high-volume, cost-sensitive task. Running it through Claude Sonnet would be expensive; you don't need frontier-model quality to produce a 1–2 sentence summary.
DeepInfra's meta-llama/Llama-3.1-70B-Instruct hits the right balance:
- $0.07/1M input tokens — roughly 20× cheaper than Claude Sonnet
- OpenAI-compatible API — existing Groq/OpenAI code works with minimal changes
- 70B parameters — accuracy is solid for summarization tasks
Updated AI Routing Table
| Task | Model | Input cost |
|---|---|---|
| Tag suggestions | Groq llama-3.3-70b | Free tier |
| Bulk summarization | DeepInfra llama-3.1-70b | $0.07/1M |
| Prose balance review | Nebius llama-3.3-70b | $0.10/1M |
| Long-form summaries | Claude Haiku | $0.25/1M |
| Design decisions | Claude Sonnet | $3.00/1M |
Bulk processing follows a simple rule: cheapest model that meets quality bar.
Supabase Edge Function
// ai-hub/index.ts (action: "notes.bulk_summarize")
case "notes.bulk_summarize": {
const { notes } = body; // [{ id: string, content: string }]
const summaries = await Promise.all(
notes.map(async (note: { id: string; content: string }) => {
const response = await fetch(
"https://api.deepinfra.com/v1/openai/chat/completions",
{
method: "POST",
headers: {
"Authorization": `Bearer ${Deno.env.get("DEEPINFRA_API_KEY")}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "meta-llama/Llama-3.1-70B-Instruct",
messages: [
{
role: "system",
content: "Summarize the note in 1–2 sentences. In Japanese.",
},
{ role: "user", content: note.content.slice(0, 1000) },
],
max_tokens: 100,
temperature: 0.2,
}),
}
);
const data = await response.json();
return {
id: note.id,
summary: data.choices[0].message.content,
};
})
);
return new Response(JSON.stringify({ summaries }), {
headers: { "Content-Type": "application/json" },
});
}
Promise.all parallelizes the calls — 10 notes completes in ~1–2 seconds.
Flutter Side
// note_list_page.dart
Future<void> _bulkSummarize(List<Note> notes) async {
final response = await Supabase.instance.client.functions.invoke(
'ai-hub',
body: {
'action': 'notes.bulk_summarize',
'notes': notes.map((n) => {
'id': n.id,
'content': n.content,
}).toList(),
},
);
final summaries = (response.data['summaries'] as List)
.cast<Map<String, dynamic>>();
setState(() {
for (final s in summaries) {
_summaryMap[s['id'] as String] = s['summary'] as String;
}
});
}
One tap, all notes summarized. The results land in a local map for immediate display.
Cost Estimate
| Scenario | Cost |
|---|---|
| 100 notes × 500 tokens avg | $0.0035 |
| 1,000 notes × 500 tokens avg | $0.035 |
| 1,000 users × 100 notes/month | ~$3.50/month |
At indie-app scale, bulk summarization costs under $5/month.
The Pattern
The OpenAI-compatible API means you can swap between Groq, DeepInfra, Nebius, and Fireworks with a one-line URL change. Build your routing layer once, then tune model selection by task cost profile.
Building in public: https://my-web-app-b67f4.web.app/
Top comments (0)