<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sukhitha Basnayake</title>
    <description>The latest articles on DEV Community by Sukhitha Basnayake (@sukhitha_b).</description>
    <link>https://dev.to/sukhitha_b</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3717657%2Fe14a43d3-f821-41c6-8727-84767e6fbb22.png</url>
      <title>DEV Community: Sukhitha Basnayake</title>
      <link>https://dev.to/sukhitha_b</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sukhitha_b"/>
    <language>en</language>
    <item>
      <title>Why Cell Type Annotation is Still the Hardest Part of scRNA-seq (And How Multi-Agent AI Fixes It)</title>
      <dc:creator>Sukhitha Basnayake</dc:creator>
      <pubDate>Mon, 26 Jan 2026 13:34:41 +0000</pubDate>
      <link>https://dev.to/sukhitha_b/why-cell-type-annotation-is-still-the-hardest-part-of-scrna-seq-and-how-multi-agent-ai-fixes-it-4bj6</link>
      <guid>https://dev.to/sukhitha_b/why-cell-type-annotation-is-still-the-hardest-part-of-scrna-seq-and-how-multi-agent-ai-fixes-it-4bj6</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faobqr92cbyci3iy9kgif.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faobqr92cbyci3iy9kgif.png" alt="Cover image" width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You've clustered your single-cell RNA-seq data. Your UMAP looks beautiful. Now comes the hard part: &lt;strong&gt;what are these cells?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you've worked with scRNA-seq data, you know this pain. Manual annotation takes weeks. Reference-based methods fail on disease samples. And when you finally publish, Reviewer 2 asks: &lt;em&gt;"How confident are you in cluster 7's annotation?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You have no good answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Annotation Bottleneck is Real
&lt;/h2&gt;

&lt;p&gt;Let's talk numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reference classifiers trained on healthy tissue show &lt;strong&gt;15-30% accuracy drop&lt;/strong&gt; on disease samples&lt;/li&gt;
&lt;li&gt;They miss rare cell types in ~20% of cases
&lt;/li&gt;
&lt;li&gt;Manual annotation has &lt;strong&gt;25% inter-annotator variability&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Current methods give you a label with zero justification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Worse, datasets now routinely contain &lt;strong&gt;millions of cells&lt;/strong&gt;. The computational bottleneck has shifted from analysis to &lt;em&gt;biological interpretation&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why LLMs Alone Don't Cut It
&lt;/h2&gt;

&lt;p&gt;GPT-4 achieved 75% agreement with expert annotations—impressive! But existing LLM approaches have critical gaps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They only see &lt;strong&gt;top marker genes&lt;/strong&gt;, not full expression profiles&lt;/li&gt;
&lt;li&gt;Knowledge is &lt;strong&gt;frozen at training time&lt;/strong&gt; (no current literature)&lt;/li&gt;
&lt;li&gt;No mechanism to &lt;strong&gt;validate predictions&lt;/strong&gt; against databases&lt;/li&gt;
&lt;li&gt;Zero &lt;strong&gt;uncertainty quantification&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You get a confident answer that might be completely wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Over Model Selection
&lt;/h2&gt;

&lt;p&gt;Here's what we learned building &lt;a href="https://github.com/NygenAnalytics/CyteType" rel="noopener noreferrer"&gt;CyteType&lt;/a&gt;: &lt;strong&gt;the problem isn't the LLM—it's how you structure the task.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of asking one model "what is this cell?", we built a &lt;strong&gt;five-agent system&lt;/strong&gt; where each agent handles a distinct part of scientific reasoning:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faobqr92cbyci3iy9kgif.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faobqr92cbyci3iy9kgif.png" alt="CyteType multi-agent workflow" width="800" height="410"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Five specialized agents work together: context analysis → hypothesis generation → evidence validation → confidence scoring → synthesis&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Five Agents
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Contextualizer Agent&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Establishes biological ground truth &lt;em&gt;before&lt;/em&gt; annotation begins. Infers organism, tissue, pathway context from your data and metadata. Integrates with GTEx, Enrichr (GO, Reactome, WikiPathways), and blitzGSEA.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Annotator Agent&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Generates &lt;strong&gt;multiple competing hypotheses&lt;/strong&gt; instead of one prediction. Tests each against the &lt;em&gt;full expression profile&lt;/em&gt; by querying a pseudobulked expression database. Selects the best hypothesis and maps it to Cell Ontology terms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Reviewer Agent&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Simulates an expert panel. Checks predictions against CellGuide, detects cellular heterogeneity, triggers re-annotation when needed. This creates an interpretable "trust layer."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Literature Agent&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Connects annotations to current knowledge. Searches PubMed for supporting evidence, identifies disease associations (Disease Ontology), flags drug targets (Drug Ontology).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Summarizer Agent&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Synthesizes results across your entire study. Performs similarity analysis, disambiguates naming inconsistencies, generates semantic cluster ordering.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Benchmark That Matters
&lt;/h2&gt;

&lt;p&gt;We tested on &lt;strong&gt;205 clusters&lt;/strong&gt; across four diverse datasets (HypoMap, Immune Cell Atlas, GTEx v9, Mouse Pancreatic Atlas).&lt;/p&gt;

&lt;p&gt;To isolate architectural benefits, we compared CyteType against &lt;strong&gt;GPTCellType using the same GPT-5 model&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CyteType vs. GPTCellType (same LLM):&lt;/strong&gt; 388% higher similarity score (p &amp;lt; .001)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CyteType vs. CellTypist:&lt;/strong&gt; 267% higher
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CyteType vs. SingleR:&lt;/strong&gt; 100% higher&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The comparison using identical models proves: &lt;strong&gt;architecture matters more than model choice.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwxljqf6n8xdwx2de2b18.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwxljqf6n8xdwx2de2b18.png" alt="Performance comparison across 16 LLMs" width="652" height="640"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;We tested 16 LLMs—both closed (GPT-5, Claude, Gemini) and open-weight (DeepSeek R1, Qwen3). Even open models outperformed traditional methods.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Model Flexibility Without Sacrificing Performance
&lt;/h2&gt;

&lt;p&gt;Here's the kicker: &lt;strong&gt;you're not locked into expensive API calls.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open-weight models like DeepSeek R1 and Kimi K2 achieve &lt;strong&gt;95% of peak performance&lt;/strong&gt; at lower cost. LLMs with built-in chain-of-thought reasoning showed &lt;em&gt;no significant advantage&lt;/em&gt; (p = 0.22)—CyteType's workflow supersedes model-native reasoning.&lt;/p&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choose models based on &lt;strong&gt;cost and privacy needs&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Run locally with Ollama for &lt;strong&gt;air-gapped operation&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Switch models without rewriting your pipeline&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  More Than Labels: Discovery
&lt;/h2&gt;

&lt;p&gt;Applying CyteType to &lt;strong&gt;977 clusters across 20 datasets&lt;/strong&gt; revealed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;41%&lt;/strong&gt; received functional enhancement (cell state information)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;29%&lt;/strong&gt; refined to specific subtypes
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;30%&lt;/strong&gt; required major reannotation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Annotations mapped to &lt;strong&gt;327 unique Cell Ontology terms&lt;/strong&gt; and identified &lt;strong&gt;116 distinct cell states&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Example: In a diabetic kidney disease atlas, "parietal epithelial cells" were relabeled as &lt;strong&gt;injured proximal tubule cells&lt;/strong&gt; (ALDH1A2+, CFH+, VCAM1+)—a discovery that changes biological interpretation.&lt;/p&gt;
&lt;h2&gt;
  
  
  Confidence You Can Trust
&lt;/h2&gt;

&lt;p&gt;The Reviewer agent generates calibrated confidence scores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High-confidence annotations had significantly higher similarity scores (F = 23.88, p &amp;lt; .001)&lt;/li&gt;
&lt;li&gt;Heterogeneous clusters showed lower similarity (F = 8.45, p &amp;lt; .01)
&lt;/li&gt;
&lt;li&gt;Median majority agreement exceeded &lt;strong&gt;80% across all LLMs&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now when Reviewer 2 asks about cluster 7, you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Confidence score&lt;/li&gt;
&lt;li&gt;Supporting/conflicting markers
&lt;/li&gt;
&lt;li&gt;Literature citations&lt;/li&gt;
&lt;li&gt;Alternative hypotheses considered&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;

&lt;p&gt;CyteType is &lt;strong&gt;open-source&lt;/strong&gt; (CC BY-NC-SA 4.0):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python (AnnData):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;cytetype
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;R (Seurat):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="n"&gt;devtools&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;install_github&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"NygenAnalytics/CyteTypeR"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both generate comprehensive HTML reports and integrate directly into your existing workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/NygenAnalytics/CyteType" rel="noopener noreferrer"&gt;https://github.com/NygenAnalytics/CyteType&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Preprint: &lt;a href="https://www.biorxiv.org/content/10.1101/2025.11.06.686964v1" rel="noopener noreferrer"&gt;https://www.biorxiv.org/content/10.1101/2025.11.06.686964v1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href="https://cytetype.nygen.io/" rel="noopener noreferrer"&gt;https://cytetype.nygen.io/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Your Biggest Annotation Challenge?
&lt;/h2&gt;

&lt;p&gt;We built CyteType to solve our own annotation headaches. What problems are you facing?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rare cell types that references miss?&lt;/li&gt;
&lt;li&gt;Disease contexts where nothing works?
&lt;/li&gt;
&lt;li&gt;Inconsistent annotations across studies?&lt;/li&gt;
&lt;li&gt;Explaining your calls to reviewers?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Drop a comment—I'd love to hear what you're working on and whether this approach could help.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Full disclosure: I work at Nygen Analytics, the team behind CyteType. We open-sourced this because we think the architecture principle—structuring tasks for LLMs rather than just prompting harder—applies way beyond biology.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>bioinformatics</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>python</category>
    </item>
  </channel>
</rss>
