<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: rosibis-piedra</title>
    <description>The latest articles on DEV Community by rosibis-piedra (@rosibispiedra).</description>
    <link>https://dev.to/rosibispiedra</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3685291%2Fad95c042-9a3b-462e-b477-8fb5cff25c1e.png</url>
      <title>DEV Community: rosibis-piedra</title>
      <link>https://dev.to/rosibispiedra</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rosibispiedra"/>
    <language>en</language>
    <item>
      <title>Building InterOrdra: A Semantic Gap Detector</title>
      <dc:creator>rosibis-piedra</dc:creator>
      <pubDate>Wed, 31 Dec 2025 16:56:13 +0000</pubDate>
      <link>https://dev.to/rosibispiedra/building-interordra-a-semantic-gap-detector-4f23</link>
      <guid>https://dev.to/rosibispiedra/building-interordra-a-semantic-gap-detector-4f23</guid>
      <description>&lt;h1&gt;
  
  
  Building InterOrdra: A Semantic Gap Detector
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Week 1 - From abstract idea to deployed MVP&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Hi! I'm Rosibis, an AI/ML student transitioning from Technical Support to AI Engineering. This is Week 1 of building InterOrdra, a semantic gap detection framework. Follow along as I document the journey.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Have you ever explained something perfectly clear to you, only to watch the other person's eyes glaze over? Or read documentation that &lt;em&gt;technically&lt;/em&gt; answers your question but somehow... doesn't?&lt;/p&gt;

&lt;p&gt;That's a &lt;strong&gt;semantic gap&lt;/strong&gt; - and they're everywhere:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📚 Technical docs that assume knowledge users don't have&lt;/li&gt;
&lt;li&gt;🤖 AI prompts that get confusing responses&lt;/li&gt;
&lt;li&gt;🔬 Expert explanations that lose non-experts entirely&lt;/li&gt;
&lt;li&gt;💼 Cross-team communication where everyone speaks "different languages"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The frustrating part? These gaps are &lt;strong&gt;invisible&lt;/strong&gt;. You know something's wrong, but you can't point to &lt;em&gt;exactly&lt;/em&gt; where the misunderstanding lives.&lt;/p&gt;

&lt;p&gt;I wanted to build a tool that makes these invisible gaps &lt;strong&gt;visible and measurable&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Insight
&lt;/h2&gt;

&lt;p&gt;A few weeks ago, I had this recurring thought (honestly, more like an obsession): &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"What if communication gaps aren't random failures, but &lt;strong&gt;detectable patterns in semantic topology&lt;/strong&gt;?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I started seeing it geometrically - like two texts existing as point clouds in high-dimensional space. When they "understand" each other, the clouds overlap. When they don't, there are &lt;strong&gt;orphaned concepts&lt;/strong&gt; floating in one space with no corresponding points in the other.&lt;/p&gt;

&lt;p&gt;This led to a bigger vision I'm calling the &lt;strong&gt;Resonance Spectrometer&lt;/strong&gt; - an instrument to detect coordinated pattern transmission across different "communication bands" (not just human language, but any system that transmits organized information).&lt;/p&gt;

&lt;p&gt;InterOrdra is the first instrument in that spectrum: detecting semantic gaps in human text.&lt;/p&gt;

&lt;p&gt;But I needed to start somewhere concrete. So: &lt;strong&gt;MVP first, philosophy second&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technical Decisions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.11&lt;/strong&gt; - Fast, clean, great ML ecosystem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sentence Transformers&lt;/strong&gt; (&lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt;) - Lightweight semantic embeddings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scikit-learn&lt;/strong&gt; - Clustering (DBSCAN) and similarity calculations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streamlit&lt;/strong&gt; - Rapid prototyping for UI (deployed in &amp;lt;1 day)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plotly&lt;/strong&gt; - Interactive 3D visualizations of semantic space&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why These Choices?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Sentence Transformers over OpenAI embeddings:&lt;/strong&gt; I wanted this to run &lt;strong&gt;free and local&lt;/strong&gt;. No API costs, no rate limits, complete control. &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt; is fast, multilingual-friendly, and good enough for detecting structural gaps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streamlit over Flask/FastAPI:&lt;/strong&gt; I needed to go from idea to deployed product in days, not weeks. Streamlit let me focus on the algorithm, not routing and frontend plumbing. Plus, free hosting on Streamlit Cloud.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DBSCAN clustering over K-means:&lt;/strong&gt; Semantic concepts don't form neat spherical clusters. DBSCAN finds arbitrary-shaped clusters and automatically detects "noise" (orphaned concepts) - which is exactly what I wanted.&lt;/p&gt;




&lt;h2&gt;
  
  
  Challenges (The Real Story)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. spaCy Deployment Hell
&lt;/h3&gt;

&lt;p&gt;Initially used spaCy for text splitting. Worked perfectly locally. Deployed to Streamlit Cloud? Instant crash.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; spaCy's language models are HUGE. Streamlit Cloud's free tier couldn't handle it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Ripped out spaCy entirely. Replaced with a simple regex-based splitter (&lt;code&gt;simple_splitter.py&lt;/code&gt;). Works for 95% of cases, way faster, zero deployment issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; Don't over-engineer early. "Good enough and deployed" beats "perfect and stuck locally."&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Git Chaos with venv/
&lt;/h3&gt;

&lt;p&gt;Accidentally committed my entire virtual environment (393 MB of Python packages) to GitHub. Multiple failed deployments because Streamlit kept trying to install from a corrupted cache.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="nt"&gt;--cached&lt;/span&gt; venv/
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"venv/"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .gitignore
git add .gitignore
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"Remove venv from tracking"&lt;/span&gt;
git push &lt;span class="nt"&gt;--force&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; &lt;code&gt;.gitignore&lt;/code&gt; is your friend. Set it up FIRST, not after you've already pushed disasters.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Import Path Confusion
&lt;/h3&gt;

&lt;p&gt;Streamlit Cloud uses different working directory assumptions than local dev. My imports broke on deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Broke on Streamlit Cloud
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;backend.embeddings&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;generate_embeddings&lt;/span&gt;

&lt;span class="c1"&gt;# Fixed version
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abspath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__file__&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;backend.embeddings&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;generate_embeddings&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; Always test relative imports. Better yet, structure projects as proper Python packages from day 1.&lt;/p&gt;




&lt;h2&gt;
  
  
  Current State
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;✅ What Works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Semantic similarity analysis between any two texts&lt;/li&gt;
&lt;li&gt;Detection of "orphaned concepts" (ideas in one text with no match in the other)&lt;/li&gt;
&lt;li&gt;Vocabulary analysis (shared vs unique words)&lt;/li&gt;
&lt;li&gt;3D interactive visualization of semantic topology&lt;/li&gt;
&lt;li&gt;Actionable recommendations to close gaps&lt;/li&gt;
&lt;li&gt;Deployed and public: &lt;a href="https://interordra.streamlit.app" rel="noopener noreferrer"&gt;interordra.streamlit.app&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;⚠️ Current Limitations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;UI only in Spanish (English translation in progress)&lt;/li&gt;
&lt;li&gt;Mobile experience has occasional rendering issues&lt;/li&gt;
&lt;li&gt;Only detects similarity-based gaps - still exploring complementarity and harmonic patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;📊 Early Traction:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Live for ~1 week&lt;/li&gt;
&lt;li&gt;Growing organically&lt;/li&gt;
&lt;li&gt;Waiting for first user feedback&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Immediate (this week):
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;🌐 English UI toggle&lt;/li&gt;
&lt;li&gt;📱 Mobile responsive fixes&lt;/li&gt;
&lt;li&gt;📄 Export results as PDF&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Short-term (next 2-4 weeks):
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Advanced gap detection&lt;/strong&gt; - Beyond similarity analysis&lt;/li&gt;
&lt;li&gt;Analytics setup (seeing actual usage patterns)&lt;/li&gt;
&lt;li&gt;File upload support (.txt, .docx, .pdf)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Medium-term (1-3 months):
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Public API (FastAPI backend)&lt;/li&gt;
&lt;li&gt;Multi-text comparison (analyze 3+ texts simultaneously)&lt;/li&gt;
&lt;li&gt;Deeper semantic topology analysis&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;🌐 &lt;strong&gt;Live demo:&lt;/strong&gt; &lt;a href="https://interordra.streamlit.app" rel="noopener noreferrer"&gt;interordra.streamlit.app&lt;/a&gt;&lt;br&gt;&lt;br&gt;
💻 &lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/rosibis-piedra/interordra" rel="noopener noreferrer"&gt;github.com/rosibis-piedra/interordra&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Curious what you'll discover. Drop your findings in the comments or open an issue on GitHub if you spot bugs 🐛&lt;/p&gt;


&lt;h2&gt;
  
  
  Reflection
&lt;/h2&gt;

&lt;p&gt;This project felt &lt;em&gt;different&lt;/em&gt;. Usually I second-guess myself constantly. With InterOrdra, I had this weird certainty - like I was building something that needed to exist, and I was just the person who happened to notice it first.&lt;/p&gt;

&lt;p&gt;Took 4 days from "hmm interesting idea" to "deployed MVP with users." That's the power of:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Starting with a concrete problem (not abstract philosophy)&lt;/li&gt;
&lt;li&gt;Choosing boring, reliable tech&lt;/li&gt;
&lt;li&gt;Shipping fast, iterating faster&lt;/li&gt;
&lt;li&gt;Not letting perfect kill good&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Next post: diving deeper into the semantic topology math and why DBSCAN + cosine similarity reveals structure that traditional NLP misses.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;What do you think?&lt;/strong&gt; Have you experienced semantic gaps in your work? How do you currently handle miscommunication between systems?&lt;/p&gt;

&lt;p&gt;Drop a comment below - I'd love to hear your thoughts! 💬&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
### **3. Series metadata (para posts futuros):**
Cuando publiques el segundo post, podés crear una serie:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Series: Building InterOrdra&lt;br&gt;
Part: 1&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;


---

*Building in public. Learning in public. Breaking things in public.*  
*Follow along: I'm documenting the full journey from Technical Support Engineer → AI/ML Engineer.*
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>python</category>
      <category>nlp</category>
      <category>buildpublic</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
