DEV Community

Cover image for How I built a 30,000-venue platform without writing code
Ziv Goldvasser
Ziv Goldvasser

Posted on

How I built a 30,000-venue platform without writing code

Six months ago I would have laughed at that sentence. Today it's just my morning routine.

Gayout is an LGBTQ+ travel directory — bars, saunas, hotels, Pride events, mega-events across 120 countries in 11 languages. About 30,000 venues, hundreds of thousands of indexed pages, ~$300/month total infrastructure cost. Built and operated solo. I can read PHP slowly. I cannot write a non-trivial query without help.

This post is the actual stack and the patterns that make it work — not a high-level "AI is amazing" rant.

## The stack

Boring web: PHP 8 + MariaDB on LiteSpeed shared hosting. Cloudflare in front. No frameworks. Long-running cron jobs do all the interesting work.

The interesting layer is the AI orchestration:

  • Claude Sonnet → Haiku for city content generation (40 cities/night, A-Z cycle)
  • Gemini 2.5 Flash for translation across 11 languages + venue descriptions
  • Anthropic for the trip planner (free user-facing feature)
  • Google Places + DataForSEO + Eventbrite + Meetup + OutSavvy + AllEvents + TripAdvisor + Skiddle + RA + SeatGeek + Ticketmaster for venue/event data ingestion

Each pipeline runs on its own cron schedule with three guarantees baked in: flock single-instance lock, hard runtime cap, batch size limit. Without those, one bug took the entire site down at 3am once — a lesson I now apply religiously.

The pattern: nightly content + nightly cleanup

Every night, 40 city pages get refreshed. For each city the cron runs about 15 sequential audit steps, wrapped in try/catch so one failure doesn't break the rest:

foreach ([
    ['lgbtq backfill',     fn() => CityAuditor::backfillLgbtqType($vid)],
    ['archive stale',      fn() => CityAuditor::archiveStaleVenues($vid)],
    ['miscat fix',         fn() => CityAuditor::fixMisCategorized($vid)],
    ['spa/sauna split',    fn() => CityAuditor::reclassifySpaSauna($vid)],
    ['cruising detect',    fn() => CityAuditor::detectCruising($vid)],
    ['recurring events',   fn() => CityAuditor::detectRecurringEvents($vid)],
    ['translate non-Latin', fn() => CityAuditor::translateNonLatinVenues($vid)],
    ['enrich events',      fn() => CityAuditor::enrichVenueEvents($vid)],
    ['city Q&A',           fn() => CityAuditor::generateCityQA($vid)],
    ['review summary',     fn() => CityAuditor::summarizeVenueReviews($vid)],
    ['humanize guide',     fn() => CityAuditor::humanizeCityGuide($vid)],
    ['hotel coverage',     fn() => CityAuditor::ensureHotelCoverage(...)],
    ['local-lang venues',  fn() => CityAuditor::discoverLocalLanguageVenues(...)],
] as [$label, $fn]) {
    try { $auditSummary[] = $fn(); }
    catch (\Throwable $e) { $auditSummary[] = "$label=ERR(...)"; }
}
Enter fullscreen mode Exit fullscreen mode

By morning: 40 city pages have fresher intros, new FAQ blocks, archived closed venues, deduped recurring events, re-tagged categories, and freshly discovered venues from local-language Google Places searches ("schwule Bar Berlin" surfaces results "gay bar Berlin" misses).

The actual hard part: data quality

The content generation isn't hard. AI handles that for fractions of a cent per page. The hard part is everything around the content. A few real examples:

Pleasuredrome's English page had a Hebrew sentence in it. A translation pipeline wrote into the source EN column instead of translations table. Found by a one-time scan for non-Latin chars in description_long using a byte-length-to-char-length ratio (Hebrew is 2 bytes/char, English ~1):

SELECT id, name FROM venues
WHERE status = 'approved'
  AND LENGTH(description_long) > CHAR_LENGTH(description_long) * 1.3
Enter fullscreen mode Exit fullscreen mode

Vivares in Madrid had sddfsd as its description. Found with a gibberish detector — short, all-lowercase, no spaces:

SELECT id, name FROM venues
WHERE CHAR_LENGTH(TRIM(description_short)) BETWEEN 3 AND 30
  AND description_short REGEXP '^[a-zA-Z]+\.?$'
Enter fullscreen mode Exit fullscreen mode

The card grid showed "Drag Brunch with Mizery × 8". Recurring weekly events imported as separate rows. Collapsed at the display layer (not DB — imports keep adding) with a (venue_id, normalized_title) dedup that strips month prefixes:

function dedup_recurring_events(array $rows, int $cap = 0): array {
    $out = []; $seen = [];
    foreach ($rows as $e) {
        $t = $e['title'];
        $t = preg_replace('/^\s*(Jan|Feb|...|Dec)[a-z]*\s+\d{0,2}[,\s]*\d{0,4}\s*[-–—]\s*/i', '', $t);
        $t = preg_replace('/\s+\d{1,2}(st|nd|rd|th)?\b/i', '', $t);
        $t = strtolower(preg_replace('/[^\p{L}\p{N}]+/u', '', $t));
        $key = ($e['venue_id'] ?? 0) . '|' . $t;
        if ($t === '' || isset($seen[$key])) continue;
        $seen[$key] = true;
        $out[] = $e;
        if ($cap > 0 && count($out) >= $cap) break;
    }
    return $out;
}
Enter fullscreen mode Exit fullscreen mode

These aren't sexy AI features. They're the actual product.

The humanizer (or: how to keep AI text out of your AI text)

Every AI writer trips the same tells: vibrant, nestled, rich tapestry, hidden gem, bustling, whether you're a seasoned, in conclusion. A second pass uses a senior-editor prompt to strip them. The prompt is the entire architecture:

You are a senior travel editor at Condé Nast Traveler — known for a sharp,
skeptical, human voice that strips marketing language and replaces it with
real observation. You are revising an AI-generated draft.

🚫 BANNED PHRASES — delete or replace any of these on sight:
  vibrant · nestled · rich tapestry · delve into · bustling · hidden gem ·
  melting pot · feast for the senses · something for everyone · world-class ·
  iconic · plethora · myriad · navigate · landscape · journey · embrace ·
  immerse yourself · at the heart of · a haven for · a must-visit ·
  whether you're a · seasoned · in conclusion · overall · plays a crucial role

✍️ HOW HUMANS WRITE TRAVEL:
  - Specifics over generics. "€8 cocktails" beats "reasonable prices."
  - Short sentences mixed with long ones. Vary aggressively.
  - Use contractions: "it's", "don't", "you'll".
  - Mild opinion is good. AI never has opinions. Humans do.
  - One-line paragraphs are fine.
  - Cut every sentence that doesn't earn its place.
Enter fullscreen mode Exit fullscreen mode

Then a one-time scan flags published content that still contains banned phrases and queues it for re-humanization. Iteratively, the corpus gets better.

Cost discipline matters more than the AI

This week I cut a DataForSEO bill by ~75% with five-minute changes. The pattern:

  1. Four search keywords had ~80% overlap. "gay events" + "pride" + "drag" + "lgbtq" returned mostly the same rows. Cut to 2. → 50% saving.

  2. Sunday "weekly drain" pulled every city. Filtered to cities with ≥3 approved venues. → Another big saving on Sundays.

  3. Saturated-city cooldown. If the last 3 runs of a city all returned 0 new inserts, skip that city for 30 days:

SELECT city_id FROM (
    SELECT city_id, inserted_count, posted_at,
           ROW_NUMBER() OVER (PARTITION BY city_id ORDER BY posted_at DESC) AS rn
    FROM dataforseo_tasks
    WHERE status = 'complete' AND keyword LIKE 'gay events%'
) t
WHERE rn <= 3
GROUP BY city_id
HAVING COUNT(*) = 3
   AND SUM(inserted_count) = 0
   AND MAX(posted_at) > NOW() - INTERVAL 30 DAY
Enter fullscreen mode Exit fullscreen mode
  1. Day-of-week skip. Top-cities cron stopped running Tue/Thu/Sat. Events don't appear hourly; daily polling was overkill. → 43% extra saving.

Total: from ~$10/month to ~$2.50/month. The same pattern applies to every API in the stack — most cost is wasted on data you already have.

What "no developer" actually means

It doesn't mean no engineering happens. It means a single non-engineer can:

  1. Specify clearly. "Make a thing that dedups events" doesn't work. "Find events where (venue_id, normalized_title) matches another future event, keep the earliest, archive the rest, and add a 30-day cooldown so imports don't immediately re-create them" works.

  2. Read output critically. I can't always spot a logic bug, but I can almost always spot when AI output doesn't match what I asked for. That's enough to iterate.

  3. Pay attention to failure modes. Every job has hard caps on runtime, batch size, and dollars spent. Without them, one bug = one expensive incident.

  4. Be honest when stuck. "I have no idea why this is broken, here's the error and the last three things I changed" is the fastest fix at 11pm.

The bigger picture

The platform I run today would have required a $500K seed round and a team of four three years ago. Now it's me, $300/month in AI/API costs, and a hosting bill smaller than my coffee budget.

Live data is at gayout.com. 30,000+ venues, 120 countries, 11 languages, including content in Russian, Hebrew, Arabic. Bug reports welcome.

If you're sitting on an idea you've told yourself you can't build because you can't code — that excuse is getting weaker every quarter.


Ziv runs Gayout.com — an LGBTQ+ travel platform built solo with AI tools. Based in Tel Aviv. Follow on Twitter/X and let me know what you want to know.

**Tags:** #ai #buildinpublic #showdev #indiehacker

Top comments (1)

Collapse
 
vollos profile image
Pon

I had to reread the byte-length to char-length ratio twice. Using it to catch the Hebrew sentence that bled into your English column is the kind of trick you only invent after getting burned once, and it stuck with me. Same with adding the flock lock and the hard runtime cap after one bug took the site down at 3am. You're clearly disciplined about the failure modes you can see.

What I'd point at is the one that doesn't show up in your morning audit, because it only fires when someone is poking at it on purpose. You said it yourself: you can read PHP slowly but can't write or fully audit a non-trivial query. That's exactly the spot where AI-written SQL gets risky. If any of those queries ever takes a value that started as user input or an API response, and it gets concatenated into the string instead of bound as a parameter, that's a SQL injection sitting there waiting for someone to find it. The nightly cron queries are probably fine since the inputs are yours, but the user-facing trip planner and anything that writes from the ingestion APIs are where I'd look first.

Not trying to scare you, your instincts are obviously good. If it helps, I'm happy to look at how a couple of those queries handle untrusted input and tell you whether anything needs a prepared statement. No strings. I keep finding this same pattern in AI-built apps and it's an easy fix once you can see it.