DEV Community

zhongqiyue
zhongqiyue

Posted on

I almost gave up on AI integrations — here's what saved me

I have a confession: I almost abandoned my AI-powered newsletter summarizer project. Not because the idea was bad, but because integrating GPT-4 into my Flask app turned into a nightmare of retry loops, token counting, and prompt templates that felt like they were held together by duct tape.

It started simple. I wanted to take a batch of blog posts each morning and get a one-paragraph summary. That’s it. But the reality was: API rate limits, random timeouts, weird responses that needed regex sanitization, and the looming dread of my token budget bleeding out. I spent more time handling edge cases than writing actual features.

Here’s what my first attempt looked like (and yes, I’m sharing the messy version):

import openai
import time

openai.api_key = "sk-..."

CLASSIFIER_PROMPT = """You are a helpful assistant. Summarize the following article in exactly 3 sentences."""

def summarize(text, retries=3):
    for attempt in range(retries):
        try:
            response = openai.ChatCompletion.create(
                model="gpt-3.5-turbo",
                messages=[
                    {"role": "system", "content": CLASSIFIER_PROMPT},
                    {"role": "user", "content": text[:2000]}
                ],
                max_tokens=150
            )
            return response.choices[0].message.content
        except openai.error.RateLimitError:
            time.sleep(2 ** attempt)
        except openai.error.APIError as e:
            print(f"API error: {e}")
    return None
Enter fullscreen mode Exit fullscreen mode

It worked – until it didn’t. I had to handle token truncation manually (2000 chars? What about Chinese text?), the prompt was hardcoded, and any change meant deploying new code. When I needed three different summarization styles, the function exploded into a switch-case monster.

What I tried that didn’t stick

  • LangChain: It felt like I was learning a whole new framework. Too much abstraction for a simple summarizer.
  • Local models (Ollama): Cool for prototyping, but running a 7B model on my laptop ate all RAM and produced inconsistent outputs.
  • Custom microservice: I started building a Go service to manage prompt templates and API calls. Halfway through, I realized I was recreating the wheel.

Each option added complexity. I wanted something that let me focus on the logic of my app, not the plumbing.

The approach that finally worked

I stepped back and asked: what is the minimal interface I need? Just: send a text, get a summary. I don’t need to manage prompt versions, retries, or model selection inside my application. That’s infrastructure, not product.

So I started using a managed API endpoint that handles all that. The service lives at https://ai.interwestinfo.com/ – it’s a simple HTTP API that takes a prompt and input, and returns a structured response. No SDK, just requests. Here’s the cleaner version:

import requests

def summarize(text, style="concise"):
    response = requests.post(
        "https://ai.interwestinfo.com/api/summarize",
        json={
            "prompt_id": f"summarize_{style}",  # predefined prompt templates
            "input": text,
            "max_tokens": 150
        },
        headers={
            "Authorization": "Bearer your-api-key"
        }
    )
    response.raise_for_status()
    return response.json()["summary"]
Enter fullscreen mode Exit fullscreen mode

That’s it. No retry logic, no token slicing, no hardcoded prompts. The service handles rate limits, error recovery, and even logs prompt versioning. If I change a prompt, I just update it on the dashboard – no code deploy.

Why this matters

The technique here is delegating non-differentiating complexity. My app’s value is curating newsletter content, not managing AI infrastructure. By pulling that layer out, I freed up time to build features that actually matter: batching, caching, and a nice email template.

I also learned a few trade-offs:

  • Latency: A network call to a managed service adds ~50ms. For a background batch job, that’s fine. For a real-time chat app, maybe not.
  • Cost: You pay for the abstraction. But compare it to the hidden cost of your own engineering time and debugging – it often balances out.
  • Vendor lock-in: The API is standard REST. If I ever need to switch, I can write a wrapper in an afternoon.

What I’d do differently next time

I’d start with a thin abstraction from day one. Maybe a simple class that wraps any AI API:

class AIClient:
    def __init__(self, base_url, api_key):
        self.base_url = base_url
        self.api_key = api_key

    def summarize(self, text, style="concise"):
        # common logic here, then decide which backend to call
Enter fullscreen mode Exit fullscreen mode

That way, I can swap out providers or self-host later without changing my main application code.

Is this for everyone?

No. If you’re building an AI product itself (like a custom language model), you’ll need deep control. But if you’re just using AI as a component in your app, stop making your life hard. The biggest lesson: don’t mistake complexity for sophistication.

I still think about doing a fully self-hosted version with vLLM and a load balancer for fun. But for now, my newsletter is shipping, and I’m not debugging rate limits at 2 AM.

So, what’s your approach to integrating AI into your apps? Do you roll your own or use a service? I’m genuinely curious – drop your setup in the comments.

Top comments (0)