rosibis-piedra

Posted on Dec 31, 2025

Building InterOrdra: A Semantic Gap Detector

#python #nlp #buildpublic #ai

Building InterOrdra: A Semantic Gap Detector

Week 1 - From abstract idea to deployed MVP

Hi! I'm Rosibis, an AI/ML student transitioning from Technical Support to AI Engineering. This is Week 1 of building InterOrdra, a semantic gap detection framework. Follow along as I document the journey.

The Problem

Have you ever explained something perfectly clear to you, only to watch the other person's eyes glaze over? Or read documentation that technically answers your question but somehow... doesn't?

That's a semantic gap - and they're everywhere:

📚 Technical docs that assume knowledge users don't have
🤖 AI prompts that get confusing responses
🔬 Expert explanations that lose non-experts entirely
💼 Cross-team communication where everyone speaks "different languages"

The frustrating part? These gaps are invisible. You know something's wrong, but you can't point to exactly where the misunderstanding lives.

I wanted to build a tool that makes these invisible gaps visible and measurable.

The Insight

A few weeks ago, I had this recurring thought (honestly, more like an obsession):

"What if communication gaps aren't random failures, but detectable patterns in semantic topology?"

I started seeing it geometrically - like two texts existing as point clouds in high-dimensional space. When they "understand" each other, the clouds overlap. When they don't, there are orphaned concepts floating in one space with no corresponding points in the other.

This led to a bigger vision I'm calling the Resonance Spectrometer - an instrument to detect coordinated pattern transmission across different "communication bands" (not just human language, but any system that transmits organized information).

InterOrdra is the first instrument in that spectrum: detecting semantic gaps in human text.

But I needed to start somewhere concrete. So: MVP first, philosophy second.

Technical Decisions

Stack

Python 3.11 - Fast, clean, great ML ecosystem
Sentence Transformers (all-MiniLM-L6-v2) - Lightweight semantic embeddings
Scikit-learn - Clustering (DBSCAN) and similarity calculations
Streamlit - Rapid prototyping for UI (deployed in <1 day)
Plotly - Interactive 3D visualizations of semantic space

Why These Choices?

Sentence Transformers over OpenAI embeddings: I wanted this to run free and local. No API costs, no rate limits, complete control. all-MiniLM-L6-v2 is fast, multilingual-friendly, and good enough for detecting structural gaps.

Streamlit over Flask/FastAPI: I needed to go from idea to deployed product in days, not weeks. Streamlit let me focus on the algorithm, not routing and frontend plumbing. Plus, free hosting on Streamlit Cloud.

DBSCAN clustering over K-means: Semantic concepts don't form neat spherical clusters. DBSCAN finds arbitrary-shaped clusters and automatically detects "noise" (orphaned concepts) - which is exactly what I wanted.

Challenges (The Real Story)

1. spaCy Deployment Hell

Initially used spaCy for text splitting. Worked perfectly locally. Deployed to Streamlit Cloud? Instant crash.

Problem: spaCy's language models are HUGE. Streamlit Cloud's free tier couldn't handle it.

Solution: Ripped out spaCy entirely. Replaced with a simple regex-based splitter (simple_splitter.py). Works for 95% of cases, way faster, zero deployment issues.

Lesson: Don't over-engineer early. "Good enough and deployed" beats "perfect and stuck locally."

2. Git Chaos with venv/

Accidentally committed my entire virtual environment (393 MB of Python packages) to GitHub. Multiple failed deployments because Streamlit kept trying to install from a corrupted cache.

Solution:

git rm -r --cached venv/
echo "venv/" >> .gitignore
git add .gitignore
git commit -m "Remove venv from tracking"
git push --force

Lesson: .gitignore is your friend. Set it up FIRST, not after you've already pushed disasters.

3. Import Path Confusion

Streamlit Cloud uses different working directory assumptions than local dev. My imports broke on deployment:

# Broke on Streamlit Cloud
from backend.embeddings import generate_embeddings

# Fixed version
import sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
from backend.embeddings import generate_embeddings

Lesson: Always test relative imports. Better yet, structure projects as proper Python packages from day 1.

Current State

✅ What Works:

Semantic similarity analysis between any two texts
Detection of "orphaned concepts" (ideas in one text with no match in the other)
Vocabulary analysis (shared vs unique words)
3D interactive visualization of semantic topology
Actionable recommendations to close gaps
Deployed and public: interordra.streamlit.app

⚠️ Current Limitations:

UI only in Spanish (English translation in progress)
Mobile experience has occasional rendering issues
Only detects similarity-based gaps - still exploring complementarity and harmonic patterns

📊 Early Traction:

Live for ~1 week
Growing organically
Waiting for first user feedback

What's Next

Immediate (this week):

🌐 English UI toggle
📱 Mobile responsive fixes
📄 Export results as PDF

Short-term (next 2-4 weeks):

Advanced gap detection - Beyond similarity analysis
Analytics setup (seeing actual usage patterns)
File upload support (.txt, .docx, .pdf)

Medium-term (1-3 months):

Public API (FastAPI backend)
Multi-text comparison (analyze 3+ texts simultaneously)
Deeper semantic topology analysis

Try It Yourself

🌐 Live demo: interordra.streamlit.app

💻 GitHub: github.com/rosibis-piedra/interordra

Curious what you'll discover. Drop your findings in the comments or open an issue on GitHub if you spot bugs 🐛

Reflection

This project felt different. Usually I second-guess myself constantly. With InterOrdra, I had this weird certainty - like I was building something that needed to exist, and I was just the person who happened to notice it first.

Took 4 days from "hmm interesting idea" to "deployed MVP with users." That's the power of:

Starting with a concrete problem (not abstract philosophy)
Choosing boring, reliable tech
Shipping fast, iterating faster
Not letting perfect kill good

Next post: diving deeper into the semantic topology math and why DBSCAN + cosine similarity reveals structure that traditional NLP misses.

What do you think? Have you experienced semantic gaps in your work? How do you currently handle miscommunication between systems?

Drop a comment below - I'd love to hear your thoughts! 💬


### **3. Series metadata (para posts futuros):**
Cuando publiques el segundo post, podés crear una serie:

Series: Building InterOrdra
Part: 1




---

*Building in public. Learning in public. Breaking things in public.*  
*Follow along: I'm documenting the full journey from Technical Support Engineer → AI/ML Engineer.*

DEV Community

Building InterOrdra: A Semantic Gap Detector

Building InterOrdra: A Semantic Gap Detector

The Problem

The Insight

Technical Decisions

Stack

Why These Choices?

Challenges (The Real Story)

1. spaCy Deployment Hell

2. Git Chaos with venv/

3. Import Path Confusion

Current State

What's Next

Immediate (this week):

Short-term (next 2-4 weeks):

Medium-term (1-3 months):

Try It Yourself

Reflection

Top comments (0)