DEV Community: Kaicheng zhang

5 AI-Entertainment Hacks That Hit 100+ Upvotes — and What I Learned From Each

Kaicheng zhang — Sun, 07 Jun 2026 01:13:40 +0000

I got a Reddit scouting bot to find "hack signals" — things people hack together that pick up 100+ upvotes within a few days, on topics like local LLM, character AI, and ComfyUI. Reading through them has been such fun. Below are the ones I learned the most from personally.

For each: the Problem, the Hack, and the popular comments / what I learned.

1. EchoText — an AI companion texting app

r/SillyTavernAI · 250 upvotes

Problem. AI roleplay users want casual, intimate side conversations with their characters outside the main RP story — like texting a friend IRL.

Hack. An iMessage-style floating panel for SillyTavern with dynamic emotions that evolve over time, proactive messaging (characters text you first!), and natural-language image generation ("send me a selfie" just works).

Popular comments. A popular comment said the emotional-state feature is "ridiculously interesting" and they wanted to try it. The OP replied that it makes characters feel more compelling and dynamic, with user emoji reactions affecting the character's emotional state. Characters can also react with emojis depending on their mood and how they feel about the user's message.

2. NPCforge — an AI game NPC bridge

r/LocalLLaMA · 540 upvotes

Problem. Game NPCs are static and scripted — no real conversation, no adaptation to player actions. Players want immersive AI-driven characters that react dynamically.

Hack. A SillyTavern extension that bridges any game to local RP models. Uses a small Qwen 3.5 0.8B as "game master" to map NPC dialogue to actual in-game actions. Downloads game wikis for lore injection, auto-clones character voices from game files.

What I learned. A 24B model handles the replies, and the 0.8B model controls NPC behavior. Other games in this space: CHIM and Mantella for Skyrim.

My MVP experiment idea: plug Call of Duty into an LLM romance engine — because a lot of users like the CoD setting.

3. MemeDropship — a meme-to-merch speed arbitrage pipeline

r/Entrepreneur · 446 upvotes

Problem. Viral memes have a 24–48 hour monetization window. By the time traditional merch sellers notice a trend, create designs, and list products, the moment has passed. Manual workflows can't compete with internet speed.

Hack. Someone built a fully automated pipeline: scrape Twitter/X for viral posts → Midjourney generates parody shirt concepts → automatic mockup creation → products pushed to a Shopify POD store. All before most brands notice the meme exists.

Popular comments. One of the wildest ones was a founder who connected Reddit scraping + GPT summarization + AI voice cloning + TikTok auto-posting into one pipeline. The system would detect rising discussions in niche subreddits, turn them into short opinion videos with synthetic narration, post them automatically, and funnel traffic into affiliate landing pages before the topic got saturated. Completely absurd setup, but apparently the speed advantage mattered more than production quality.

Feels like a lot of the craziest automations now are basically people building tiny autonomous marketing teams with tools like Make, n8n, and increasingly AI workflow platforms like Runable stitched together behind the scenes.

Another one from this genre — SaaS cloning: someone cloned a $50MM SaaS by downloading all of their support videos. They took the audio from the video stream and used NVIDIA's Canary model to generate subtitles for the audio. From there they fed the subtitles into Claude and had it score every video for usefulness in terms of website functionality, as well as mark any segments time-wise that were particularly useful. They took the timestamps and used ffmpeg to dump frames around the marked timestamps, then fed those back into a multimodal vision-capable Kimi K2.6 to convert the app screenshots into usable SPEC files to clone the business. They then took the SPEC and had Claude diligently build everything. Not done yet, but so far the result is a pretty functional equivalent of their entire software stack — but with some of their own improvements and features.

4. Lumiverse — a self-hosted AI roleplay frontend

r/SillyTavernAI · 232 upvotes

Problem. SillyTavern users want a modern, power-user-focused AI chat frontend with real-time events and an extension runtime, but current options feel dated or limited.

Hack. Built a self-hosted AI frontend with a data layer, real-time event bus, LLM generation pipeline, and full extension runtime — 650+ commits in this release.

5. AnimaDex — an AI character/artist search engine

r/StableDiffusion · 871 upvotes

Problem. People using the new Anima base model can't easily tell which anime/game characters it knows well — they're guessing blindly at prompts.

Hack. Built a searchable database of 49,000 pre-generated character/artist samples with filters (hair color, eye color, franchise, etc.). Ran an RTX Pro 6000 for about 24 hours using the new Anima base model to generate 49,000 samples, going in descending order of post count on Danbooru for those characters. Integrated CivitAI LoRA links.

What I learned. There is an entire community around AI-generated pics/videos of 同人角色 (doujin characters) on Danbooru and Pixiv. Fun fact: they can use Stripe despite NSFW.

Reading through these has been such fun. If you want me to keep posting what the scouting bot finds, say the word.

— @launchingmonkey

Products Aren't Built for Humans Anymore — I Decided to Serve AI

Kaicheng zhang — Wed, 25 Mar 2026 19:40:37 +0000

AI Is Browsing the Internet for Us

A phone conversation with my friend Lucas this morning made a bunch of things click at once.

Lucas: You deployed an app recently, right? What tool did you use?

Me: No idea, AI deployed it. Ask AI.

I said it without thinking — because that's genuinely how I operate now: I only care about results and the bill. What tool AI used, which approach is better — I couldn't care less.

That moment was a wake-up call.

When AI starts browsing the internet on our behalf, marketing to "humans" stops making sense. The real audience you should be serving is AI itself. Make it so AI can discover your product, pull information from it without friction, and interact with it directly.

Public Data Defines What AI Knows

The conversation kept going:

Lucas: AI says fly.io and Railway, recommends fly.io.

Me: Let me check my history… yep, that's exactly what I got too.

Lucas is on the other side of the planet, using a different AI, asking in a different way. But both AIs gave nearly the same answer.

Why?

Because they draw from the same pool of public data.

What's "public"? Simple: content you can see without logging in. Reddit, Stack Overflow, tech blogs, Hacker News — you Google it, you see it, and so can AI. That's public.

Now think about TikTok, Instagram, WeChat — you can't see anything without downloading the app and logging in. Behind those walled gardens, data doesn't get out. If AI can't read it, it might as well not exist.

The more I think about it, the more convinced I am: walled-garden platforms are relics of the last era — built for humans. In AI's worldview, information that isn't public simply doesn't exist.

Why AI Will Inevitably Browse for Us

Because starting is hard.

I've been meaning to do Reddit marketing for a while. Kept putting it off — didn't know where to begin. Yesterday over dinner, I casually asked AI about it. It searched relevant threads, drafted replies, and knocked out the first step for me. Suddenly the whole thing had momentum.

Most of the time it's not that people don't want to do things — it's that the activation energy is too high. AI just needs to get you past that first step, and the rest flows. This trend is irreversible — more and more actions will be initiated by AI, with humans just approving.

What I'm Doing Next

Once this clicked, the product direction became obvious:

1. Build Only for the Public Web

All products and content: open source, open comments, users can bring their own LLM, most features work without login.

2. Build Only What AI Can Fully Automate

No more products where humans need to learn a UI first. Every feature must be discoverable by AI, understandable by AI, and completable by AI — zero to done, fully automated on the user's behalf.

3. Marketing: Get Into AI's Knowledge Base

Back to the public data theme. Marketing strategy has to revolve around "make sure AI can find us":

Find matching interest communities on Reddit and respond to real needs with relevant content. Someone posts about wanting a specific type of character? Drop a link to a matching one. And yes — use AI to automate this step too.

Over time, when the next update of LLMs trains on Reddit data, our content becomes part of the search results for those queries.

This isn't the old "spend ad dollars → get conversions" playbook. It's "plant information → let AI harvest it." Spread keywords across countless niche interest areas so that when AI answers any related question, it naturally surfaces us.

TL;DR

AI is browsing the internet for humans. If your product is only built for human eyes, you're invisible to AI.
Public data is AI's only source of truth. If it's not public, it doesn't exist in AI's world.
Go AI-first. Every capability should be open to AI — let it discover you, understand you, and act on behalf of your users.

This is what I'm building toward for the next five years.

I'm building Echomelon - an AI chat platform for interactive fiction. Open to conversations

How We Built Chat Memory That Actually Works — Lessons from Shipping to 100K Users

Kaicheng zhang — Sat, 21 Mar 2026 01:06:49 +0000

Most AI chatbots forget you exist after a few messages. Here's how we built a memory system that doesn't.

I've been building EchoMelon — a roleplay and companion chat platform — for a while now. Early on, the most common complaint we got was brutal in its simplicity:

"Why doesn't my character remember what happened last week?"

Fair question. You'd pour hours into building a relationship with an AI character, share secrets, go on adventures, name things together — and then the character would just... blank on all of it. Because under the hood, all it sees is the last handful of messages.

This post is a deep dive into how we solved that. No hand-wavy theory. Actual patterns, actual trade-offs, actual scars.

The Problem: Context Windows Are a Lie

Every LLM has a context window — the amount of text it can "see" at once. Claude gives you 200K tokens. Gemini offers a million. Sounds like a lot, right?

It's not. Here's why:

Your system prompt eats a chunk. Character personality, world-building, behavioral rules — for a rich roleplay character, this alone can be 3,000–8,000 tokens.
Cost scales linearly with context. Stuffing 200K tokens into every API call would bankrupt you before lunch.
More context ≠ better responses. Models get confused with too much raw history. They start contradicting earlier events, mixing up details, hallucinating scenes that never happened.

So you can't just dump the entire chat history into the prompt. You need to be surgical about what the model sees.

Our Approach: Memo-Based Rolling Memory

The core idea is dead simple: summarize old conversations into structured "memos" and inject those summaries into the prompt alongside recent messages.

Think of it like how your own memory works. You don't remember the exact words of a conversation from three months ago. But you remember: "That was the night she told me about her past. We were on the rooftop. Things changed after that."

That's what we're building — compressed, meaningful memories that capture the what mattered, not the what was said verbatim.

Here's how the analogy maps:

Your Brain	Our System
Last 10 minutes of conversation — crystal clear	Last 8 raw message pairs — full fidelity
Older events — fuzzy highlights, not exact words	Memo summaries — structured highlights
You forget the mundane, remember what mattered	Prompt filters routine events, keeps milestones
Memories form passively, in the background	Summaries generated async, never blocking
You don't replay every detail when reminded	No RAG — chronological summaries, not flashbacks

Step 1: The Short-Term Memory (Recent Chat History)

The simplest layer. We keep the last 8 message pairs as raw conversation — the model sees exact words, tone, nuance. This is your "working memory."

  older messages              last 8 turns
  ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄  ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄
  msg  msg  msg  msg  msg  ┃  msg  msg  msg  msg  msg  msg  msg  msg
                           ┃
  forgotten by the model   ┃  ← these go to the LLM as-is

Why 8? It's a balance. Enough for conversational coherence ("wait, you just said X two messages ago"), cheap enough to not blow up our API bill, and short enough that the model doesn't lose focus.

Step 2: The Long-Term Memory (Memo Summaries)

This is where it gets interesting. Those "forgotten" older messages aren't truly lost — they've been compressed into memo summaries.

Every 8 messages, we check: "Is the recent batch full AND none of them have a memo attached?" If yes, it's time to summarize.

  expired       │     rolling window: last 15 batches summarized          │  working memory
  ┄┄┄┄┄┄┄       │     ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄         │  ┄┄┄┄┄┄┄┄┄┄┄┄┄┄
  msg msg ┄     │     msg msg ┄     msg msg ┄    ···    msg msg ┄         │  msg msg ┄
      ↓         │           ↓             ↓                   ↓           │        ↓
  Memo 4 ✕      │     Memo 5        Memo 6       ···    Memo 19 ←new      │  Sent Raw

Each batch of 8 messages gets compressed into one memo with 2-3 highlights. New memos are appended to the end. The last 8 messages stay raw. We keep only the last 15 memos — when a new one is created, the oldest rolls off. Simple as that.

How a Memo Gets Created

When triggered, here's what happens — all in the background, never blocking the user:

  ① Recent 8 messages
           │
           ▼
  ② "Summarize the above"
           │
           ▼
  ③ Cheap, fast model + summarization prompt
           │
           ▼
  ④ Structured highlights:
     【Highlight 1】: Emily named the stray cat "Mochi"
     【Highlight 2】: Kai revealed his fear of abandonment
           │
           ▼
  ⑤ Saved to DB on the chat row itself

  ⚠️ If anything fails → memo = null, move on. Chat never breaks.

The key design decisions:

1. Fire-and-forget. This whole flow runs async in the background. The user gets their chat response instantly — they never wait for summarization.

2. Use a cheap model. The summary doesn't need GPT-4-level intelligence. A fast, inexpensive model with good instruction-following works great. We're extracting facts, not generating creative fiction.

3. Fail gracefully. If summarization throws, we set memo = null and move on. The worst case is a gap in the memory timeline, not a crashed conversation.

The Summarization Prompt (The Secret Sauce)

This is where we spent months iterating. A generic "summarize this conversation" prompt produces garbage — it's either too verbose (defeating the purpose) or too vague (missing critical details).

Our prompt instructs the model to extract only structured "Journey Highlights" across four categories:

Relationship Progression — trust, affection, betrayal, power shifts
Significant Milestones — naming events, first words, emotional breakthroughs
Notable Items & Keepsakes — symbolic objects exchanged or discovered
Major Story Turning Points — plot twists, revelations, narrative pivots

And critically, it tells the model what not to record:

No routine daily activities (eating, sleeping, bathing)
No temporary emotional states ("felt nervous" doesn't make the cut)
No minor first-time events unless they trigger something bigger
No status bar changes (health, hunger — this is roleplay, not a game HUD)

The output format is structured:

【Highlight 1】: Emily named the stray cat "Mochi" — their first shared act of care.
【Highlight 2】: Kai revealed his fear of abandonment, deepening Emily's understanding.

Concise. Factual. No fluff. Each highlight is one sentence that captures a meaningful beat.

Step 3: Retrieval and Prompt Assembly

When a new message comes in, we pull the memo summaries from the DB and layer everything into a single prompt:

const promptComponents = {
  systemPrompt:            { text: systemPrompt,            isCacheable: true  },
  memoryInstructionPrompt: { text: memoryInstructionPrompt,  isCacheable: false },
  characterPrompt:         { text: characterPrompt,          isCacheable: true  },
  userPrompt:              { text: userPrompt,               isCacheable: false },
  memoriesPrompt:          { text: memoriesPrompt,           isCacheable: true  },
  // + recent 8 messages as the conversation history
  // + current user message
};

The isCacheable flags tie into API-level prompt caching (e.g., Claude's cache control). Components that change rarely — system prompt, character info — get cached so we don't pay full price for resending them every turn. The memories prompt is also cacheable because it only changes every ~8 messages when a new memo is created.

This saves us 30-40% on API costs on average. When you're processing millions of messages per month, that adds up fast.

But here's the thing — getting prompt caching to actually work with rolling memories and sliding chat windows is a genuinely hard problem. Every time the history window slides forward by one turn, or a new memo gets created, your cache can get invalidated. We've spent significant engineering effort on cache-aligned batching to keep hit rates high. That's a deep dive on its own — coming soon. Follow along if you don't want to miss it.

With 15 memos and 8 recent turns, the model effectively "remembers" the last ~128 messages in compressed form, plus the last 16 messages verbatim. For most conversations, this covers weeks or months of chatting.

Mistakes We Made (So You Don't Have To)

1. Our first summarization prompt was too permissive

Early versions would produce summaries like: "The characters had a pleasant conversation about the weather and then discussed dinner plans." Utterly useless. We had to be extremely prescriptive about what constitutes a "memorable" event and provide tons of good/bad examples in the prompt.

2. We tried summarizing in the main request path

Our first implementation generated the memo synchronously — the user had to wait for both the summary AND the response. Response times jumped from 2s to 5s. Moving to fire-and-forget was an obvious win.

3. We didn't handle memo creation failures gracefully

If the summarization call threw an error, the chat would crash. Adding a try/catch that sets memo = null on failure was embarrassingly simple but took us a production incident to learn.

Unexpected Benefits

The Memo Book as Navigation

Here's something we didn't plan for: our users love revisiting their earlier chat history. Scrolling through thousands of messages to find "that one scene where they confessed" is painful. Nobody wants to do it.

The memo summaries accidentally became a table of contents for the conversation. Each memo is attached to a specific message in the timeline, and the highlights tell you exactly what happened in that stretch. Users can scan the memo book, find the entry that mentions the event they're looking for, and jump straight to that point in the chat.

We didn't build this as a feature — it just fell out of the architecture. But it's become one of the things users mention most when they talk about why they stick around.

Users Hijacked the Memo Book (And We Love It)

We made the memo summaries editable — figured users might want to correct mistakes or add missing details. What actually happened was way more interesting.

Users started writing entirely new memories — things that never happened in the conversation. They'd add backstory, inside jokes, shared history they wanted the character to "remember." One user wrote three memos of detailed lore about a fictional road trip the characters supposedly took together.

Our users loved it, so we leaned into it. The memo book isn't just a technical artifact anymore. It's a creative tool. Users shape the character's memory the way you'd fill in a shared journal with a close friend — part real, part wishful, part world-building. And the AI picks it all up naturally.

Why Not RAG?

The first thing most people suggest is RAG — embed all your messages, do similarity search, pull in the most relevant chunks. We tried it. It felt wrong.

The problem is that RAG retrieval is too precise. It pulls in specific memories with crystal-clear detail based on keyword similarity, and the model keeps bringing up the same moments over and over. "Oh you mentioned a rooftop — let me recall every rooftop scene in perfect detail!" That's not how memory works. You don't replay a moment at full fidelity just because something vaguely related came up.

Human memory is lossy and chronological. You remember recent things clearly, older things as impressions, and ancient things as a few key beats. RAG gives you the opposite — a random grab bag of high-fidelity flashbacks regardless of when they happened. It's unnatural and users notice. The conversations feel uncanny.

So we went a different direction.

TL;DR

If you're building a chat app and want your AI to remember things:

Keep a short window of raw recent messages (8-10 turns) for conversational coherence.
Periodically summarize older messages into structured memos using a cheap model.
Store summaries on the chat records themselves — don't over-engineer a separate memory store.
Be extremely specific in your summarization prompt about what's worth remembering. Generic "summarize this" produces junk.
Run summarization asynchronously — never block the user's response.
Use prompt caching on the stable parts of your context to cut costs — but know that making it work well with rolling windows is its own challenge.
Fail gracefully — a missing memo is way better than a crashed chat.
Skip RAG for conversational memory — chronological summaries feel more natural than similarity-search flashbacks.

The whole system is maybe 200 lines of actual logic. The hard part isn't the code — it's the prompt engineering and knowing when to summarize vs. when to keep raw context.

I'm building EchoMelon — an AI companion platform where characters actually remember your story. Follow for more deep dives on the real engineering behind AI products.

X/Twitter: @launchingmonkey · Reddit: u/Calm_Appearance_7337