<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gaston Needleman</title>
    <description>The latest articles on DEV Community by Gaston Needleman (@harthor).</description>
    <link>https://dev.to/harthor</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3886455%2F17069438-0196-4984-baa4-804e29b42e0f.png</url>
      <title>DEV Community: Gaston Needleman</title>
      <link>https://dev.to/harthor</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/harthor"/>
    <language>en</language>
    <item>
      <title>Finding patterns across Hacker News, arXiv, and GitHub by reasoning about groups, not pairs</title>
      <dc:creator>Gaston Needleman</dc:creator>
      <pubDate>Sat, 18 Apr 2026 20:24:58 +0000</pubDate>
      <link>https://dev.to/harthor/finding-patterns-across-hacker-news-arxiv-and-github-by-reasoning-about-groups-not-pairs-fkc</link>
      <guid>https://dev.to/harthor/finding-patterns-across-hacker-news-arxiv-and-github-by-reasoning-about-groups-not-pairs-fkc</guid>
      <description>&lt;h2&gt;
  
  
  How I Built a Pattern-Detection Engine That Reads Across Eight Tech Feeds Simultaneously
&lt;/h2&gt;

&lt;p&gt;Most feed readers solve the wrong problem. They help you &lt;em&gt;consume&lt;/em&gt; faster, not &lt;em&gt;understand&lt;/em&gt; better. I kept noticing that the interesting signal wasn't inside any single post — it was in the friction between a paper on arXiv, a Show HN thread, and a Product Hunt launch happening within the same week. Nobody was naming that pattern out loud. So I built something that tries to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I built this
&lt;/h2&gt;

&lt;p&gt;Every week I'd open five or six tabs — Hacker News, a few arXiv abstracts, whatever was trending on Product Hunt — and occasionally get this feeling: &lt;em&gt;these things are connected&lt;/em&gt;. Not in an obvious "they're both about LLMs" way, but in a more structural sense. Like, someone is solving a problem in systems programming that maps onto a debate happening in ML tooling. Or a niche GitHub project is quietly becoming the implementation layer for an idea three different research groups published independently.&lt;/p&gt;

&lt;p&gt;The frustrating part was that making the connection required holding all of it in my head at once. Pair-wise similarity search (the thing most recommendation engines do) doesn't cut it. Finding that paper A is similar to paper B doesn't tell you that A + B + this HN thread together &lt;em&gt;imply&lt;/em&gt; something that none of them say directly.&lt;/p&gt;

&lt;p&gt;I wanted a tool that reasons about &lt;em&gt;groups&lt;/em&gt;, not pairs. That's what I set out to build with Constellate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The approach
&lt;/h2&gt;

&lt;p&gt;The non-obvious design decision was to treat "a pattern" as a minimum-three-node structure. Most clustering approaches will happily give you a pile of thematically similar documents. That's not what I wanted. I wanted the system to surface constellations: configurations where the &lt;em&gt;relationship between the ideas&lt;/em&gt; is the finding, not just the ideas themselves.&lt;/p&gt;

&lt;p&gt;This meant defining connection types explicitly, rather than letting an embedding distance be the only signal. I ended up with five:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chain&lt;/strong&gt; — A leads to B leads to C; a progression or causal sequence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Triangulation&lt;/strong&gt; — Three sources approaching the same conclusion from different angles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Convergence&lt;/strong&gt; — Independent ideas arriving at the same solution without apparent coordination&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Absence&lt;/strong&gt; — A gap: the pattern implies something that nobody is building or saying yet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spectrum&lt;/strong&gt; — A range of positions on a single underlying axis, with sources spread across it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The "Absence" type was the hardest to implement and the most interesting. It's not about what's there — it's about what the shape of the conversation implies is &lt;em&gt;missing&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;The pipeline runs weekly and processes roughly 140 items across HN, arXiv, Product Hunt, YC, GitHub, Hugging Face, and Dev.to.&lt;/p&gt;

&lt;p&gt;Here's the rough architecture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Ingestion layer
  └── fetch ~140 items from 8 sources
       └── normalize to: {id, source, title, body_excerpt, date, tags}

Embedding + candidate generation
  └── embed each item (text-embedding-3-small, 1536d)
  └── build approximate kNN graph (FAISS, k=15)
  └── enumerate candidate triplets from shared neighborhoods

Constellation reasoning (the interesting part)
  └── for each candidate group:
       ├── prompt: "What is the structural relationship between these ideas?"
       ├── classify into: Chain / Triangulation / Convergence / Absence / Spectrum
       ├── score: specificity, surprise, non-obviousness
       └── filter: discard if pattern is restateable as a single topic label

Output
  └── ~60 constellations
       ├── Cards view (one pattern per screen, plain language)
       └── Constellation Map (spatial graph, clusters positioned by relationship type)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The scoring step is doing real work. It's easy to generate hundreds of technically-valid groupings that are just boring ("these three things are all about Rust"). The filter that asks &lt;em&gt;"can this be summarized as a topic label?"&lt;/em&gt; culls most of those. If I can describe the pattern as "ML infrastructure" and be done with it, it's not a constellation — it's just a category.&lt;/p&gt;

&lt;p&gt;A real example from a recent run: a Hacker News thread on SQLite's WAL mode, an arXiv paper on eventual consistency in edge deployments, and a Product Hunt launch for a local-first sync library formed a &lt;strong&gt;Convergence&lt;/strong&gt; constellation. The pattern: a quiet move away from assuming reliable network connectivity as a baseline, happening simultaneously in three separate conversations that don't reference each other.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned / tradeoffs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Embeddings alone are not enough.&lt;/strong&gt; Cosine similarity surfaces thematic proximity, but thematic proximity isn't the same as structural relationship. Two items can be semantically distant but form a tight Chain. The kNN graph gives you candidates; the reasoning step is where the actual pattern detection happens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 5-type taxonomy is a constraint, not a limitation.&lt;/strong&gt; Early versions tried to let the model free-form describe relationships. The output was verbose and inconsistent. Forcing a classification makes the patterns comparable across weeks and surfaces when a certain type is unusually prevalent (three weeks of "Absence" constellations in a domain is itself a signal).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False positives are expensive.&lt;/strong&gt; A wrong constellation isn't just a missed result — it's a confidence hit. Users who see a grouping that doesn't hold up start doubting the real ones. I'd rather surface 60 high-quality constellations than 150 mediocre ones. The current cutoff is conservative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "Absence" type is hard to evaluate.&lt;/strong&gt; There's no ground truth for a gap. I can check factually whether a thing exists; I can't objectively confirm it's "missing in a meaningful way." Right now this type has the weakest precision. I mark it visually in the UI as lower-confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations and what's next
&lt;/h2&gt;

&lt;p&gt;The current pipeline is weekly, not real-time. For fast-moving news cycles this is a real limitation — a constellation that was interesting on Monday may be stale by Friday. Moving toward a rolling window (72 hours) is on the roadmap, but the reasoning step is LLM-heavy and cost/latency become real concerns at higher frequency.&lt;/p&gt;

&lt;p&gt;The source list (8 feeds right now) creates blind spots. Academic CS and ML are well-covered; hardware, biotech, climate tech are not. Expanding the source set without degrading signal-to-noise is an open problem — more items means more candidate triplets, which grows combinatorially.&lt;/p&gt;

&lt;p&gt;The Constellation Map is currently a static layout recomputed weekly. Making it interactive (click a node, see which other constellations it participates in) is the next UI feature.&lt;/p&gt;

&lt;p&gt;The waitlist is open if you want early access: &lt;a href="https://constellate.fyi" rel="noopener noreferrer"&gt;constellate.fyi&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm also genuinely curious whether the 5-type taxonomy resonates with other people who think about information architecture — happy to talk through it in the comments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Live site + waitlist: &lt;a href="https://constellate.fyi" rel="noopener noreferrer"&gt;constellate.fyi&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  webdev #ai #machinelearning #indiehackers #typescript
&lt;/h1&gt;

</description>
      <category>automation</category>
      <category>productivity</category>
      <category>programming</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
