Harish Kotra (he/him)

Posted on Apr 15

Building FalseRecall: A Production-Ready AI Memory Game with Streamlit, Provider Abstraction, and Mem0

#ai #programming #python #dailybuild2026

FalseRecall is an experiment in narrative believability: the app transforms a tiny input fact into a rich memory-like story, then challenges players to detect whether a memory is real or AI-generated.

This post walks through the architecture and implementation decisions so another engineer can fork and ship quickly.

What We Built

FalseRecall has two tightly connected experiences:

Forge: Generate a fictional memory from a minimal input
Real or AI?: Guess whether a memory is real or model-generated

Key constraints:

Keep stories plausible, not absurd
Build trust with explicit fiction labels
Keep safety guardrails active by default
Make LLM provider switching trivial

Stack

Streamlit for rapid full-stack UI
Python for orchestration
openai SDK for OpenAI + OpenAI-compatible providers
requests for Ollama native fallback API
mem0ai for optional memory layer
python-dotenv for local key management

Architecture

Code Design

The repository is intentionally modular:

falserecall/
  engine.py       # prompt orchestration + generation
  providers.py    # OpenAI / Featherless / Ollama abstraction
  prompts.py      # system and user prompt templates
  safety.py       # input checks and post-processing
  memory_layer.py # Mem0 wrapper
  game.py         # guess evaluation and challenge assembly
  memory_data.py  # seeded real memories + AI seeds

1) Provider abstraction to avoid vendor lock-in

Instead of provider-specific logic in UI code, generate_text(...) handles routing:

def generate_text(provider, model, system_prompt, user_prompt, temperature=0.9):
    if provider == "openai":
        return _generate_with_openai_compatible(...)
    if provider == "featherless":
        return _generate_with_openai_compatible(...)
    if provider == "ollama":
        return _generate_with_ollama_native(...)  # or OpenAI-compatible mode

This keeps app.py stable while changing providers.

2) Memory-context-aware generation

engine.py conditionally injects Mem0 context:

if memory_context:
    context_block = (
        "\nUser context hints (use only if relevant and plausible):\n"
        + "\n".join(f"- {item}" for item in memory_context[:5])
    )
system_prompt = f"{BASE_SYSTEM_PROMPT}\n{tone_instructions}{context_block}"

This is lightweight retrieval augmentation for narrative coherence.

3) Guardrails before model invocation

The app blocks risky inputs instead of relying only on provider moderation:

def validate_input(user_text: str) -> SafetyResult:
    if not text:
        return SafetyResult(False, "Please enter a short fact or memory.")
    if len(text) > 500:
        return SafetyResult(False, "Please keep input under 500 characters.")
    ...

The prompt also repeats safety constraints to reduce unsafe generations.

4) Game loop logic

game.py is deterministic and UI-agnostic:

def evaluate_guess(user_choice: str, actual_label: str) -> GuessResult:
    is_correct = user_choice.strip().lower() == actual_label.strip().lower()
    explanation = ...
    return GuessResult(is_correct=is_correct, explanation=explanation)

Because game logic is separate, migrating from Streamlit session state to database-backed sessions is straightforward.

Why Streamlit for this MVP

For early product validation, Streamlit optimizes for:

fast UI iteration
minimal ceremony
immediate deployability
low operational complexity

Once product-market fit is clearer, this architecture can move to FastAPI + React while reusing most core modules.

Mem0 Integration Pattern

Mem0 is optional and feature-flagged by MEM0_API_KEY.

Flow:

User sets user_id in sidebar
App calls search_memories(...)
Top context snippets influence prompt
Generated response is stored using add_memory(...)

This enables continuity between sessions without making it mandatory.

Tradeoffs and Improvements

Current MVP tradeoffs:

Session-state leaderboard is ephemeral
Seed "real" memories are static
Safety checks are regex-first (fast but limited)

Next improvements:

persistent leaderboard in SQLite/Postgres
signed "challenge links" for social sharing
moderation queue for flagged generations
telemetry (generation latency, provider success rate, guess accuracy)

How to Fork and Extend

Typical extension path:

Add a feature module in falserecall/
Wire UI controls in app.py
Document env vars and behavior in README
Add seed data and deterministic tests

Suggested first PRs:

"Export memory card as image"
"Daily challenge archive"
"Difficulty mode for AI realism"
"Persistent leaderboard backend"

Closing

FalseRecall is a good reference architecture for:

multi-provider LLM apps
memory-augmented generation
AI content safety in consumer UX
gameful interaction loops around AI output

If you fork this, keep the explicit fiction labeling and guardrails intact. They are core product behavior, not optional polish.

Github: https://github.com/harishkotra/FalseRecall

DEV Community