DEV Community

Cover image for How I'm Building a Content Factory That Catches Its Own AI Slop
Ioan G. Istrate
Ioan G. Istrate

Posted on • Originally published at blog.tripvento.com

How I'm Building a Content Factory That Catches Its Own AI Slop

*Generating 150 pages with AI is easy. Generating 150 pages that don't read like AI wrote them is the actual problem.
*

This is a live account of building that system, iteration 1 shipped in January, iteration 2 starts end of March. Some of it works. Some of it doesn't. Here's all of it.

I built a programmatic SEO pipeline that generates a city hub page and 14 intent pages for every destination Tripvento covers. One page for romantic hotels. One for remote workers. One for families with toddlers. Fourteen traveler personas, one curated list of 10 hotels each, all written by Claude Haiku at roughly $0.20 per city.

The generation part took a weekend. The quality control is taking three times as long.

Here's the full stack: the factory, the quality gates, and the parts that still slip through.

The Structure: Hub and Spoke

Every city gets 15 pages total.

The hub (/us/savannah) is the authority anchor. About 300 words covering the city layout, a neighborhood decoder, and a warning about the most common location mistake tourists make. It exists to pass link equity to the intent pages and give Google something to index.

The spokes are the money pages. /us/savannah/best-romantic-hotels. /us/savannah/best-hotels-for-remote-work. Each one has a ~150 word intro explaining why location matters for that traveler type in that specific city, followed by 10 curated hotels with individual vibe checks.

The model for the hub and all 14 intents is claude-3-haiku-20240307. Fast, cheap, good enough for structured generation when you give it tight prompts and quality gates. At ~169 API calls per city (1 hub + 14 curations + 14 intros + 140 vibe checks), the total lands around $0.20 per city on Haiku pricing.

167 pages indexed in Google so far. 10 cities. About to re run the initial batch once fixes are in and proceed with the next 10.

The Curator: Picking Hotels With a Penalty System

Before any content gets written, the pipeline has to decide which 10 hotels go on each list. With 40+ candidates per intent and 14 intents per city, the same 5 hotels would dominate every list if you just sorted by score.

The fix is a usage penalty. Each hotel starts with its raw intent score. Every time it gets selected for a list, it takes a 5 point penalty on its adjusted score for the next selection round. Hotels used 5 or more times across all lists in a city get removed from the candidate pool entirely.

raw_score = float(record.final_score)
adjusted_score = raw_score - (times_used * USAGE_PENALTY)
Enter fullscreen mode Exit fullscreen mode

The penalty value, max uses cap, and top N uniqueness threshold are tuned per city size so a city with 20 hotels needs different values than one with 200. I calibrated these against Tripvento's ranking data and they're not something I'm publishing.

On top of that, a top 3 uniqueness rule: any hotel that's already appeared in positions 1-3 on a previous list gets blocked from the top 3 spots on all subsequent lists. It can still appear at positions 4-10, but it can't dominate the editorial front of multiple lists.

The curator LLM gets the penalized scores, a list of which hotels are top 3 blocked, and a target of 10 selections. If the LLM fails or returns garbage, a score based fallback kicks in and applies the same constraints without the LLM call.

Overlap detection runs after each selection. If the current list shares more than 50% of hotels with any previous list in the same city, it logs a warning. It doesn't reject the list (note: at 14 intents per city with a limited hotel pool, some overlap is inevitable) but it flags it for review.

Quality Gate 1: The Banned Phrases List

This is the one people screenshot.

Before any generated text gets saved, it runs through a list of phrases that auto reject or flag the content for retry. The list has two categories: "AI slop classics" and "superlatives without substance."

BANNED_PHRASES = [
    # The AI slop classics
    "nestled in",
    "hidden gem",
    "tapestry of",
    "bustling streets",
    "vibrant atmosphere",
    "rich tapestry",
    "perfect blend",
    "seamless blend",
    "oasis of",
    "haven for",
    "paradise for",
    "unforgettable experience",
    "memories that last",
    "something for everyone",
    "whether you're looking for",
    "look no further",
    "has it all",
    "steeped in history",
    "where old meets new",
    "the heart of",
    "in the heart of",
    "gateway to",
    "stone's throw",
    "just steps from",
    "mere minutes from",

    # Superlatives without substance
    "world-class",
    "the best",
    "the perfect",
    "the ultimate",
    "the ideal",
    "truly unique",
    "unparalleled",
    "unmatched",
    "exceptional",
    "extraordinary",
    "second to none",

    # Filler
    "it goes without saying",
    "needless to say",
    "it's no secret",
    "of course",
]
Enter fullscreen mode Exit fullscreen mode

The check is a simple case insensitive substring scan:

def check_banned_phrases(text: str) -> List[str]:
    text_lower = text.lower()
    found = []
    for phrase in BANNED_PHRASES:
        if phrase in text_lower:
            found.append(phrase)
    return found
Enter fullscreen mode Exit fullscreen mode

If a generated intro contains more than 2 banned phrases in a ~150 word intro, it gets rejected and the LLM gets one more attempt with an explicit note appended to the prompt:

REJECTED - Be more specific, mention real places.
Enter fullscreen mode Exit fullscreen mode

If the retry still fails the threshold, the best effort result gets saved anyway with a warning logged. You can't get perfect output on every retry without burning your cost budget.

Quality Gate 2: The Grounding Checker

A page that passes the banned phrases check can still be useless if it doesn't mention anything real. "Chicago offers diverse neighborhoods for every type of traveler" is technically original, technically not AI slop by phrase detection, and completely worthless as a page.

The grounding checker scores text for city-specific detail using regex patterns for real places:

GROUNDING_PATTERNS = [
    r'\b[A-Z][a-z]+ (Street|St|Avenue|Ave|Boulevard|Blvd|Road|Rd)\b',
    r'\b(Downtown|Midtown|Uptown|Old Town|Historic District)\b',
    r'\b(North|South|East|West) (Side|End|Quarter)\b',
    r'\b[A-Z][a-z]+ (District|Quarter|Village|Heights|Hill|Park|Square)\b',
]

def check_grounding(text: str, city_name: str) -> Dict:
    has_city = city_name.lower() in text.lower()
    grounding_matches = []
    for pattern in GROUNDING_PATTERNS:
        matches = re.findall(pattern, text)
        grounding_matches.extend(matches)

    score = 0
    if has_city:
        score += 30
    score += min(len(grounding_matches) * 20, 70)

    return {
        'score': score,
        'is_grounded': score >= 50
    }
Enter fullscreen mode Exit fullscreen mode

30 points if the city name appears. 20 points per specific place mention, capped at 70. A page needs a score of 50 or higher to be considered grounded. Below that, same retry logic as the banned phrases gate.

The combination of both gates is what actually matters. A page that scores well on grounding and has no banned phrases reads like it was written by someone who knows the city. A page that fails both sounds like it was generated by a model that was told "write something about hotels."

Quality Gate 3: TF-IDF Similarity Detection

Banned phrases and grounding check individual pages. This gate checks pages against each other.

The problem it solves: the Chicago romantic intro and the Denver romantic intro, both written by the same model with the same prompt, will drift toward similar language even if they pass the first two gates. "River North puts you close to the best rooftop bars" and "LoDo puts you close to the best rooftop bars" are structurally identical and Google will notice.

I wrote a lightweight TF-IDF implementation without pulling in sklearn because there is no reason to add that dependency for a reporting tool that runs on a schedule:

def find_similar_pairs(texts: List[Tuple[str, str]], threshold: float = 0.7) -> List[Dict]:
    tokenized = [(id_, tokenize(text)) for id_, text in texts]
    all_tokens = [tokens for _, tokens in tokenized]
    idf = compute_idf(all_tokens)

    vectors = [
        (id_, compute_tfidf_vector(tokens, idf))
        for id_, tokens in tokenized
    ]

    similar = []
    for i, (id1, vec1) in enumerate(vectors):
        for j, (id2, vec2) in enumerate(vectors[i + 1:], i + 1):
            sim = cosine_similarity(vec1, vec2)
            if sim >= threshold:
                similar.append({
                    'pair': (id1, id2),
                    'similarity': round(sim, 3),
                })

    return sorted(similar, key=lambda x: x['similarity'], reverse=True)
Enter fullscreen mode Exit fullscreen mode

It groups all published pages by intent, then runs pairwise cosine similarity across every city's intro for that intent. If Chicago's romantic intro and Denver's romantic intro hit 0.7 similarity or above, they both get flagged.

The management command that runs this produces output like:

romantic: 2 similar pairs
  Chicago <-> Denver: 0.74
  Austin <-> Nashville: 0.71
Enter fullscreen mode Exit fullscreen mode

Right now this is a reporting tool, not an auto fix. The flagged pairs go into a review queue. Auto regenerating flagged content is on the roadmap.

Quality Gate 4: The Vibe Check Prompt

The vibe check is the per hotel copy, 2 sentences per hotel, written in the voice of a friend texting you advice. It gets the hotel's geo score, nearby POIs with walking distances, neighborhood description, and the AI scoring reason from the ranking engine.

The system prompt:

You are a local friend giving hotel advice over text. Keep it real, keep it short.

NO corporate speak. NO "offers amenities" or "solid choice" or "ideal for". 
Just talk like a person.
Enter fullscreen mode Exit fullscreen mode

The user prompt asks for a short honest take on what's good and what's the catch, plus a curator note with specific nearby POI names and walking times. The key constraint: it explicitly bans restating the trip type. Without that, every note starts with "Perfect for a romantic getaway."

Note: vibe checks use softer enforcement than intros, a single banned phrase in 2 sentences doesn't trigger a retry. Keep that in mind reading the examples below.

When the voice holds, it reads like this. Chicago business executive list, rank 3:

The LaSalle Chicago — This is the heart of downtown, where the suits clear out after 6pm and all the bars and restaurants take over. Pricey, but the location can't be beat. Curator note: You've got Acme Fine Dining, Macy's, and The Bean sculpture all just a minute away. And with the Art Institute and Millennium Park right here, you can hit the top sights without a long trek.

Sharp readers will notice 'the heart of downtown' in there. It passed because it's in a vibe check, not an intro. That's the soft threshold in action.

And when the system catches a genuinely useful local detail, Chicago family list, rank 7:

The Villa Toscana — Packed with bars and restaurants near Wrigley Field. Loud, energetic area perfect for young crowds, but can be a bit much for families. Curator note: You're a 1 minute walk from tons of restaurants, theaters, and attractions. But avoid Clark Street during Cubs games — try Sheffield Avenue for less chaos.

That Cubs tip wasn't from the geo data because the POI feed gives the model POI types and distances, not names. The model saw "Theater" and "Attraction" within 30 meters, inferred it was in Wrigleyville from the neighborhood name in the prompt, and generated the Clark Street advice from its own training knowledge. It happened to be correct. That's a different kind of grounding than what the pipeline enforces — and it's exactly why the grounding checker exists. You can't rely on the model knowing the right street to avoid during a Cubs game for every city.

When it doesn't work, it looks like this. Savannah romantic list, rank 1:

Ballastone Inn — The hotel is right in the heart of Savannah's historic charm. You'll be surrounded by beautiful old buildings, great restaurants, and tons to see and do - just a stroll away.

"In the heart of" is on the banned list. It slipped through on the vibe check's softer threshold. That's iteration 1.

That's the honest part. The banned phrases gate runs on intro text with a hard retry threshold — more than 2 banned phrases and the intro gets rejected and regenerated. Vibe checks use a softer pass: a single banned phrase in 2 sentences doesn't trigger a retry, it just gets flagged in the audit log. At 140 vibe checks per city the cost of hard-blocking and regenerating every minor hit would add up fast. The tradeoff is some slop slips through at the hotel level that wouldn't survive at the intro level.

What Still Slips Through

The banned phrases gate catches most of the slop but not all of it. "In the heart of" appears in 4 of the 10 Savannah romantic vibe checks despite being on the list. The gate runs on intro text with a strict threshold and on vibe checks with a flag-and-retry, but a single banned phrase in a short vibe check doesn't always trigger a retry.

Rank 8's curator note reads: "1 minute walk from the nightlife, museum, and clinic." Clinic. A medical facility leaked through the POI data into the editorial copy. The POI filtering that feeds into the vibe check prompt needs tighter category exclusions.

The voice is inconsistent across hotels on the same list. The "local friend over text" persona holds for some outputs and collapses into mild review speak for others. The prompt does a reasonable job but Haiku at this price point has limits.

None of this is catastrophic. The pages are indexed, the content is specific and grounded, and the curator notes are genuinely useful. But "good enough to index" and "good enough to convert" are different bars. The quality iteration is ongoing.

The Numbers

  • Model: Claude Haiku (claude-3-haiku-20240307)

  • Cost: ~$0.20 per city

  • API calls per city: ~169 (1 hub + 14 curations + 14 intros + 140 vibe checks)

  • Pages per city: 15 (1 hub + 14 intent pages)

  • Hotels per intent page: 10

  • Cities completed: 10 (out of 311 destinations in the platform, soon to be 400+)

  • Total pages: 150

  • Google indexed: 167 so far (the 150 content pages plus static and destination pages picked up from the sitemap)

This is iteration 1, shipped in January and left to index while we audited what broke. The 10 cities aren't a capacity problem because Tripvento already covers 311 destinations. They're a pipeline quality problem. We didn't want to scale broken content.

Iteration 2 starts end of March with the fixes from the "What's Next" section below applied. The plan: 5 new city guides per week ramping up to 10, targeting 400+ cities by end of year. Every existing city gets regenerated when the pipeline moves forward — nothing stays on iteration 1 quality permanently.


Iteration 2: What's Getting Fixed

This is the checklist for the end of March pipeline run.

Auto fix for flagged similarity pairs. Right now the TF-IDF check is read only which means that it reports similar pairs but doesn't regenerate them. Iteration 2 makes it an action, not a report.

POI category filtering. The geo data needs an exclusion list before it reaches the prompts ex: medical facilities, government buildings, anything that doesn't belong in editorial hotel copy. The "clinic in a romantic hotel list" bug is a data pipeline problem, not a model problem.

Tighter vibe check retries. Iteration 1 gives vibe checks 1 retry attempt. Intros get 2. Iteration 2 bumps vibe checks to 2 retries with a tighter banned phrase threshold. The goal is closing the gap between what slips through at the hotel level and what survives at the intro level.

Post publish quality monitoring. The uniqueness scorer currently runs on demand. Iteration 2 wires it into a scheduled validator, if a published page drops below the quality threshold after a pipeline update changes the scoring, it gets queued for regeneration automatically.

Ranking observability. Every pipeline run snapshots adjusted scores and positions per hotel-intent pair into the database so when iteration 2 ships, we can diff it against iteration 1's output before touching a single published page.

The article will update when iteration 2 ships.


This is part 5 of the Building Tripvento series. Part 1 covered deleting 55M rows to scale the database. Part 2 covered the multi LLM self healing data pipeline. Part 3 covered the Django performance audit. Part 4 covered securing the API against 10k scraper requests.

I'm Ioan Istrate, founder of Tripvento — a hotel ranking API that scores properties against 14 traveler personas using geospatial intelligence and semantic AI. Previously worked on ranking systems at U.S. News & World Report. If you want to talk about Django performance, security, or API design, let's connect on LinkedIn.

Top comments (0)