FalseRecall is an experiment in narrative believability: the app transforms a tiny input fact into a rich memory-like story, then challenges players to detect whether a memory is real or AI-generated.
This post walks through the architecture and implementation decisions so another engineer can fork and ship quickly.
What We Built
FalseRecall has two tightly connected experiences:
-
Forge: Generate a fictional memory from a minimal input -
Real or AI?: Guess whether a memory is real or model-generated
Key constraints:
- Keep stories plausible, not absurd
- Build trust with explicit fiction labels
- Keep safety guardrails active by default
- Make LLM provider switching trivial
Stack
-
Streamlitfor rapid full-stack UI -
Pythonfor orchestration -
openaiSDK for OpenAI + OpenAI-compatible providers -
requestsfor Ollama native fallback API -
mem0aifor optional memory layer -
python-dotenvfor local key management
Architecture
Code Design
The repository is intentionally modular:
falserecall/
engine.py # prompt orchestration + generation
providers.py # OpenAI / Featherless / Ollama abstraction
prompts.py # system and user prompt templates
safety.py # input checks and post-processing
memory_layer.py # Mem0 wrapper
game.py # guess evaluation and challenge assembly
memory_data.py # seeded real memories + AI seeds
1) Provider abstraction to avoid vendor lock-in
Instead of provider-specific logic in UI code, generate_text(...) handles routing:
def generate_text(provider, model, system_prompt, user_prompt, temperature=0.9):
if provider == "openai":
return _generate_with_openai_compatible(...)
if provider == "featherless":
return _generate_with_openai_compatible(...)
if provider == "ollama":
return _generate_with_ollama_native(...) # or OpenAI-compatible mode
This keeps app.py stable while changing providers.
2) Memory-context-aware generation
engine.py conditionally injects Mem0 context:
if memory_context:
context_block = (
"\nUser context hints (use only if relevant and plausible):\n"
+ "\n".join(f"- {item}" for item in memory_context[:5])
)
system_prompt = f"{BASE_SYSTEM_PROMPT}\n{tone_instructions}{context_block}"
This is lightweight retrieval augmentation for narrative coherence.
3) Guardrails before model invocation
The app blocks risky inputs instead of relying only on provider moderation:
def validate_input(user_text: str) -> SafetyResult:
if not text:
return SafetyResult(False, "Please enter a short fact or memory.")
if len(text) > 500:
return SafetyResult(False, "Please keep input under 500 characters.")
...
The prompt also repeats safety constraints to reduce unsafe generations.
4) Game loop logic
game.py is deterministic and UI-agnostic:
def evaluate_guess(user_choice: str, actual_label: str) -> GuessResult:
is_correct = user_choice.strip().lower() == actual_label.strip().lower()
explanation = ...
return GuessResult(is_correct=is_correct, explanation=explanation)
Because game logic is separate, migrating from Streamlit session state to database-backed sessions is straightforward.
Why Streamlit for this MVP
For early product validation, Streamlit optimizes for:
- fast UI iteration
- minimal ceremony
- immediate deployability
- low operational complexity
Once product-market fit is clearer, this architecture can move to FastAPI + React while reusing most core modules.
Mem0 Integration Pattern
Mem0 is optional and feature-flagged by MEM0_API_KEY.
Flow:
- User sets
user_idin sidebar - App calls
search_memories(...) - Top context snippets influence prompt
- Generated response is stored using
add_memory(...)
This enables continuity between sessions without making it mandatory.
Tradeoffs and Improvements
Current MVP tradeoffs:
- Session-state leaderboard is ephemeral
- Seed "real" memories are static
- Safety checks are regex-first (fast but limited)
Next improvements:
- persistent leaderboard in SQLite/Postgres
- signed "challenge links" for social sharing
- moderation queue for flagged generations
- telemetry (generation latency, provider success rate, guess accuracy)
How to Fork and Extend
Typical extension path:
- Add a feature module in
falserecall/ - Wire UI controls in
app.py - Document env vars and behavior in README
- Add seed data and deterministic tests
Suggested first PRs:
- "Export memory card as image"
- "Daily challenge archive"
- "Difficulty mode for AI realism"
- "Persistent leaderboard backend"
Closing
FalseRecall is a good reference architecture for:
- multi-provider LLM apps
- memory-augmented generation
- AI content safety in consumer UX
- gameful interaction loops around AI output
If you fork this, keep the explicit fiction labeling and guardrails intact. They are core product behavior, not optional polish.


Top comments (0)