DEV Community

Cover image for Building FalseRecall: A Production-Ready AI Memory Game with Streamlit, Provider Abstraction, and Mem0
Harish Kotra (he/him)
Harish Kotra (he/him)

Posted on

Building FalseRecall: A Production-Ready AI Memory Game with Streamlit, Provider Abstraction, and Mem0

FalseRecall is an experiment in narrative believability: the app transforms a tiny input fact into a rich memory-like story, then challenges players to detect whether a memory is real or AI-generated.

This post walks through the architecture and implementation decisions so another engineer can fork and ship quickly.

What We Built

FalseRecall has two tightly connected experiences:

  1. Forge: Generate a fictional memory from a minimal input
  2. Real or AI?: Guess whether a memory is real or model-generated

Key constraints:

  • Keep stories plausible, not absurd
  • Build trust with explicit fiction labels
  • Keep safety guardrails active by default
  • Make LLM provider switching trivial

Stack

  • Streamlit for rapid full-stack UI
  • Python for orchestration
  • openai SDK for OpenAI + OpenAI-compatible providers
  • requests for Ollama native fallback API
  • mem0ai for optional memory layer
  • python-dotenv for local key management

Architecture

Architecture

Code Design

The repository is intentionally modular:

falserecall/
  engine.py       # prompt orchestration + generation
  providers.py    # OpenAI / Featherless / Ollama abstraction
  prompts.py      # system and user prompt templates
  safety.py       # input checks and post-processing
  memory_layer.py # Mem0 wrapper
  game.py         # guess evaluation and challenge assembly
  memory_data.py  # seeded real memories + AI seeds
Enter fullscreen mode Exit fullscreen mode

1) Provider abstraction to avoid vendor lock-in

Instead of provider-specific logic in UI code, generate_text(...) handles routing:

def generate_text(provider, model, system_prompt, user_prompt, temperature=0.9):
    if provider == "openai":
        return _generate_with_openai_compatible(...)
    if provider == "featherless":
        return _generate_with_openai_compatible(...)
    if provider == "ollama":
        return _generate_with_ollama_native(...)  # or OpenAI-compatible mode
Enter fullscreen mode Exit fullscreen mode

This keeps app.py stable while changing providers.

2) Memory-context-aware generation

engine.py conditionally injects Mem0 context:

if memory_context:
    context_block = (
        "\nUser context hints (use only if relevant and plausible):\n"
        + "\n".join(f"- {item}" for item in memory_context[:5])
    )
system_prompt = f"{BASE_SYSTEM_PROMPT}\n{tone_instructions}{context_block}"
Enter fullscreen mode Exit fullscreen mode

This is lightweight retrieval augmentation for narrative coherence.

3) Guardrails before model invocation

The app blocks risky inputs instead of relying only on provider moderation:

def validate_input(user_text: str) -> SafetyResult:
    if not text:
        return SafetyResult(False, "Please enter a short fact or memory.")
    if len(text) > 500:
        return SafetyResult(False, "Please keep input under 500 characters.")
    ...
Enter fullscreen mode Exit fullscreen mode

The prompt also repeats safety constraints to reduce unsafe generations.

4) Game loop logic

game.py is deterministic and UI-agnostic:

def evaluate_guess(user_choice: str, actual_label: str) -> GuessResult:
    is_correct = user_choice.strip().lower() == actual_label.strip().lower()
    explanation = ...
    return GuessResult(is_correct=is_correct, explanation=explanation)
Enter fullscreen mode Exit fullscreen mode

Because game logic is separate, migrating from Streamlit session state to database-backed sessions is straightforward.

Why Streamlit for this MVP

For early product validation, Streamlit optimizes for:

  • fast UI iteration
  • minimal ceremony
  • immediate deployability
  • low operational complexity

Once product-market fit is clearer, this architecture can move to FastAPI + React while reusing most core modules.

Mem0 Integration Pattern

Mem0 is optional and feature-flagged by MEM0_API_KEY.

Flow:

  1. User sets user_id in sidebar
  2. App calls search_memories(...)
  3. Top context snippets influence prompt
  4. Generated response is stored using add_memory(...)

This enables continuity between sessions without making it mandatory.

Tradeoffs and Improvements

Current MVP tradeoffs:

  • Session-state leaderboard is ephemeral
  • Seed "real" memories are static
  • Safety checks are regex-first (fast but limited)

Next improvements:

  • persistent leaderboard in SQLite/Postgres
  • signed "challenge links" for social sharing
  • moderation queue for flagged generations
  • telemetry (generation latency, provider success rate, guess accuracy)

How to Fork and Extend

Typical extension path:

  1. Add a feature module in falserecall/
  2. Wire UI controls in app.py
  3. Document env vars and behavior in README
  4. Add seed data and deterministic tests

Suggested first PRs:

  • "Export memory card as image"
  • "Daily challenge archive"
  • "Difficulty mode for AI realism"
  • "Persistent leaderboard backend"

Closing

FalseRecall is a good reference architecture for:

  • multi-provider LLM apps
  • memory-augmented generation
  • AI content safety in consumer UX
  • gameful interaction loops around AI output

If you fork this, keep the explicit fiction labeling and guardrails intact. They are core product behavior, not optional polish.

How It Works

Github: https://github.com/harishkotra/FalseRecall

Top comments (0)