<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Andrew Kennon</title>
    <description>The latest articles on DEV Community by Andrew Kennon (@andrewkennon).</description>
    <link>https://dev.to/andrewkennon</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3286897%2F7f9a3dac-ce2c-4073-8674-054153b3a40c.png</url>
      <title>DEV Community: Andrew Kennon</title>
      <link>https://dev.to/andrewkennon</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/andrewkennon"/>
    <language>en</language>
    <item>
      <title>When RAG Meets Real-World Robotics Data</title>
      <dc:creator>Andrew Kennon</dc:creator>
      <pubDate>Mon, 21 Jul 2025 07:09:34 +0000</pubDate>
      <link>https://dev.to/andrewkennon/when-rag-meets-real-world-robotics-data-45eg</link>
      <guid>https://dev.to/andrewkennon/when-rag-meets-real-world-robotics-data-45eg</guid>
      <description>&lt;p&gt;I’ve been building AI systems for autonomous vehicles long enough to develop a love-hate relationship with retrieval-augmented generation (RAG). It’s a great concept — bring relevant context into your LLM prompt at runtime — but the second you move beyond text-heavy enterprise use cases into robotics or real-time perception, things get weird fast.&lt;/p&gt;

&lt;p&gt;Let’s talk about what happens when you try to apply RAG to high-dimensional, multimodal data, and why your choice of vector database can quietly make or break your pipeline.&lt;/p&gt;




&lt;h3&gt;
  
  
  Not All Embeddings Are Created Equal
&lt;/h3&gt;

&lt;p&gt;Most RAG tutorials use sentence-transformer or OpenAI embeddings on small textual corpora. But when you’re fusing LiDAR, radar, and camera inputs — or even running multimodal embeddings from perception models like Perceiver or CLIP — you’re suddenly dealing with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2,048 to 4,096 dimensions per vector&lt;/li&gt;
&lt;li&gt;tens of millions of vectors per sensor window&lt;/li&gt;
&lt;li&gt;updates on the scale of milliseconds, not hours&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The vector DBs that look great on standard SIFT1M or Wikipedia benchmarks often collapse here. I’ve seen Milvus handle this scale better than most (especially with its tiered IVFPQ indexing), while something like Pinecone starts to choke unless you heavily batch and precompute everything.&lt;/p&gt;




&lt;h3&gt;
  
  
  Querying in the Chaos: Real-Time Constraints
&lt;/h3&gt;

&lt;p&gt;In AV systems, RAG isn’t just about semantic search — it’s about making the right decision &lt;em&gt;right now&lt;/em&gt;. Think:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“What similar trajectories did I see in prior encounters with a jaywalking pedestrian?”&lt;/li&gt;
&lt;li&gt;“Are there any annotated LiDAR clusters from edge cases similar to this object’s motion?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means your vector DB needs sub-50ms recall with high accuracy — and most importantly, &lt;strong&gt;low tail latency&lt;/strong&gt;. An index that gives you 95% recall at P50 but spikes to 800ms at P99 is a nonstarter. For me, that ruled out FAISS-on-disk solutions and pushed us toward in-memory hybrid setups, sometimes backed by Milvus or even Redis-AI when latency spikes were unacceptable.&lt;/p&gt;




&lt;h3&gt;
  
  
  Hybrid Search Isn’t Optional
&lt;/h3&gt;

&lt;p&gt;Another trap: pure ANN (approximate nearest neighbor) isn’t enough. We need hybrid search — combining structured filters (e.g. location, object class, time window) with vector similarity — to avoid surfacing irrelevant results that are semantically close but contextually useless.&lt;/p&gt;

&lt;p&gt;The systems I’ve liked best so far:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://milvus.io/" rel="noopener noreferrer"&gt;Milvus&lt;/a&gt;&lt;/strong&gt;: Flexible filtering + multi-modal vector support + GPU acceleration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://weaviate.io/" rel="noopener noreferrer"&gt;Weaviate&lt;/a&gt;&lt;/strong&gt;: Graph-aware queries and filters, good for chaining across knowledge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://qdrant.tech/" rel="noopener noreferrer"&gt;Qdrant&lt;/a&gt;&lt;/strong&gt;: Surprisingly solid for real-time hybrid search, nice JSON filter DSL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the other hand, Chroma and Lancedb are great for lightweight prototyping but start to wobble under serious ingestion or query pressure.&lt;/p&gt;




&lt;h3&gt;
  
  
  What I’d Do Differently (And What I’d Keep)
&lt;/h3&gt;

&lt;p&gt;If I were rebuilding a RAG stack for AV today, here’s where I’d land:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HNSW-based indexes tuned for short queries&lt;/li&gt;
&lt;li&gt;Streaming ingestion pipelines with nightly reindexing&lt;/li&gt;
&lt;li&gt;Embedding normalization (even small vector scale issues cascade fast)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Change:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use separate DBs for long-term recall vs short-term context&lt;/li&gt;
&lt;li&gt;Bake in observability for query latency distribution — not just mean/median&lt;/li&gt;
&lt;li&gt;Use hybrid pipelines: Redis or Vespa for immediate low-latency + Milvus for batch-heavy recall&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Final Thought
&lt;/h3&gt;

&lt;p&gt;RAG in robotics isn’t just a language problem — it’s a systems problem. The tech that works for enterprise chatbots often breaks under the weight of real-time perception and control loops. But with the right infra — and a vector DB that understands filters, scale, and latency — it’s not just possible. It’s damn useful.&lt;/p&gt;

&lt;p&gt;If you’re working on similar problems (or have war stories from trying RAG with non-text data), I’d love to swap notes.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>From Lab Toy to Core Infra: Why Vector Databases Became Default in My AD Projects</title>
      <dc:creator>Andrew Kennon</dc:creator>
      <pubDate>Mon, 14 Jul 2025 09:28:21 +0000</pubDate>
      <link>https://dev.to/andrewkennon/from-lab-toy-to-core-infra-why-vector-databases-became-default-in-my-ad-projects-58cn</link>
      <guid>https://dev.to/andrewkennon/from-lab-toy-to-core-infra-why-vector-databases-became-default-in-my-ad-projects-58cn</guid>
      <description>&lt;p&gt;I used to treat vector databases like a novelty — good for academic demos or a flashy product prototype. Definitely not something I’d trust in a mission-critical stack, especially in autonomous driving where latency budgets are brutal and edge cases rule everything.&lt;/p&gt;

&lt;p&gt;But somewhere between building yet another scene retrieval pipeline and rewriting my own ANN glue code for the fifth time, vector DBs matured. Or maybe I just got tired of reinventing the same thing with FAISS and Redis.&lt;/p&gt;

&lt;p&gt;Either way, they’re in my stack now — for semantic search, intent classification, offline query replay, even some perception data filtering. Here’s how they earned their place.&lt;/p&gt;




&lt;h3&gt;
  
  
  1. Search Needs Got Weirder Than “Top-5 Similar”
&lt;/h3&gt;

&lt;p&gt;In AD systems, especially those with human-in-the-loop tools (e.g., labeling UI, validation dashboards), search isn’t just about vector proximity. You often want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Find me all LiDAR scenes with fog that confused the neural planner”&lt;/li&gt;
&lt;li&gt;“Search for similar failure cases — but only from vehicles with the same camera calibration”&lt;/li&gt;
&lt;li&gt;“Pull historical conversations where the driver reported ‘not feeling in control’ — even if they didn’t use those exact words”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these work well with classic keyword matching or naive ANN-only lookup. I need hybrid queries — fast vector search &lt;em&gt;and&lt;/em&gt; structured filters. That’s where vector databases like &lt;strong&gt;Milvus&lt;/strong&gt;, &lt;strong&gt;Qdrant&lt;/strong&gt;, and &lt;strong&gt;Weaviate&lt;/strong&gt; started making sense.&lt;/p&gt;

&lt;p&gt;Postgres + pgvector? I tried. It’s okay for low-QPS analytics queries. But it gets crushed when you scale up or need low latency.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. What Actually Worked in My Benchmarks
&lt;/h3&gt;

&lt;p&gt;I ran real tests on 10M+ 768-dim vectors (text+sensor fusion output), using &lt;code&gt;m6id.2xlarge&lt;/code&gt; on AWS. Here’s what I got for recall vs throughput vs memory:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Index Type&lt;/th&gt;
&lt;th&gt;Recall (%)&lt;/th&gt;
&lt;th&gt;QPS&lt;/th&gt;
&lt;th&gt;Memory/Vector&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;IVF_FLAT (FP32)&lt;/td&gt;
&lt;td&gt;95.2&lt;/td&gt;
&lt;td&gt;236&lt;/td&gt;
&lt;td&gt;3,072 bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IVF_SQ8&lt;/td&gt;
&lt;td&gt;94.1&lt;/td&gt;
&lt;td&gt;611&lt;/td&gt;
&lt;td&gt;768 bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IVF_RABITQ&lt;/td&gt;
&lt;td&gt;76.3&lt;/td&gt;
&lt;td&gt;898&lt;/td&gt;
&lt;td&gt;96 bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RABITQ + SQ8&lt;/td&gt;
&lt;td&gt;94.7&lt;/td&gt;
&lt;td&gt;864&lt;/td&gt;
&lt;td&gt;96 + 768 bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These are from Milvus with hybrid filters and metadata attached. I could keep 100M+ vectors online and still get under-10ms latency on common queries — something I couldn’t hit reliably with Weaviate or Pinecone when adding structured constraints.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. You Can’t Cheat Hybrid Search
&lt;/h3&gt;

&lt;p&gt;Every vendor now claims “hybrid search support.” But here’s what that actually means in practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Milvus&lt;/strong&gt;: True hybrid execution. You can filter on metadata + vector with one query. SQL-like interface helps. Big win if you need production control.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qdrant&lt;/strong&gt;: Also solid. Filters are expressive, and Rust backend flies. Just be careful with multi-field combinations — debugging errors can be opaque.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weaviate&lt;/strong&gt;: Flexible schema and GraphQL are neat, but hybrid joins can be flaky under load.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pinecone&lt;/strong&gt;: Honestly? Great uptime, but very much a black box. Not ideal if you need low-level index tuning or want to reason about query paths.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In my vehicle incident retrieval tooling, hybrid filters are &lt;em&gt;non-negotiable&lt;/em&gt;. I need to narrow results to “same model year,” “rainy weather,” or “disabled radar” &lt;em&gt;before&lt;/em&gt; doing ANN. Otherwise I get nonsense.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. What Made Me Trust It in Production
&lt;/h3&gt;

&lt;p&gt;The real turning point wasn’t just benchmark performance — it was:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Durability&lt;/strong&gt;: Can I shut down the node and not lose 10M vectors?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Index reusability&lt;/strong&gt;: Can I train an index once, persist it, and reuse it across environments?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration support&lt;/strong&gt;: Is there a Python SDK that doesn’t feel like a student project?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring&lt;/strong&gt;: Does it give me metrics, logs, and alerts when a query fails?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Milvus (especially Zilliz Cloud) checked most of those. I still had to do some index config trial-and-error (RABITQ vs HNSW vs IVF_SQ8), but once tuned, it stuck. Redis+FAISS, by comparison, felt like building a transmission from scratch just to drive to the store.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Takeaways (From the Autonomous Driving Trenches)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;If you're building anything involving sensor data retrieval, semantic logs, or multimodal fusion — a vector DB is probably worth it.&lt;/li&gt;
&lt;li&gt;Milvus is my current go-to, but Qdrant is catching up fast. If you want zero-infra headaches, Pinecone or Zilliz Cloud are decent bets.&lt;/li&gt;
&lt;li&gt;Don’t fall for benchmarks without filters — hybrid search is where things get interesting (and painful).&lt;/li&gt;
&lt;li&gt;Plan your index strategy up front. Retrofitting after launch sucks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wouldn’t call vector databases “solved” tech — but they’re finally usable. Not perfect, not plug-and-play, but good enough that I stopped rebuilding my own.&lt;/p&gt;

&lt;p&gt;Curious if others in AD or robotics have put these into production too — what worked? What didn’t? Always down to trade scars.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Working with Messy Embeddings in Real Systems: A Quick Post from Today's Debug Session</title>
      <dc:creator>Andrew Kennon</dc:creator>
      <pubDate>Mon, 07 Jul 2025 06:49:50 +0000</pubDate>
      <link>https://dev.to/andrewkennon/working-with-messy-embeddings-in-real-systems-a-quick-post-from-todays-debug-session-5ddc</link>
      <guid>https://dev.to/andrewkennon/working-with-messy-embeddings-in-real-systems-a-quick-post-from-todays-debug-session-5ddc</guid>
      <description>&lt;p&gt;Today was supposed to be a routine day. I was reviewing some logs for a multi-modal retrieval pipeline we’ve been running—camera images, lidar frames, and a few NLP tags all go into a vector store for downstream search. Pretty standard setup, right?&lt;/p&gt;

&lt;p&gt;But then the recall dropped. Quietly. No errors, no crashes, just… worse results.&lt;/p&gt;

&lt;p&gt;Turns out, this whole thing was caused by a seemingly small detail: &lt;strong&gt;inconsistent embedding norms from different modalities&lt;/strong&gt;. It sent me down a 3-hour rabbit hole involving cosine distances, vector scaling, and my own past assumptions about database behavior. Here’s what I learned (again).&lt;/p&gt;




&lt;h2&gt;
  
  
  Context: The Setup
&lt;/h2&gt;

&lt;p&gt;We’re storing multi-modal embeddings into a vector database—specifically, lidar-to-text retrieval for a roadside perception system. Each data point looks roughly like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;image_embedding&lt;/code&gt;: 512-dim vision encoder output&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;lidar_embedding&lt;/code&gt;: 256-dim learned BEV encoder output&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;text_embedding&lt;/code&gt;: 768-dim from a BERT variant&lt;/li&gt;
&lt;li&gt;Metadata: GPS, weather, scenario tags, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system uses &lt;strong&gt;Milvus&lt;/strong&gt; (v2.3) with HNSW for approximate search. Each modality goes into its own collection, but the RAG pipeline combines results at query time via re-ranking.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Recall Drift
&lt;/h2&gt;

&lt;p&gt;We noticed that queries with natural language inputs (e.g. "car parked under bridge in fog") were retrieving fewer relevant lidar segments than expected. Visual embeddings still worked well, but lidar retrieval became noticeably noisier.&lt;/p&gt;

&lt;p&gt;The embeddings were going in, indexes were fine, metadata filters were working. So what changed?&lt;/p&gt;

&lt;h3&gt;
  
  
  The culprit: vector magnitude variance.
&lt;/h3&gt;

&lt;p&gt;Some of our lidar embeddings had significantly lower norms (around 0.5–1.2), while the text embeddings were tightly clustered around 7–9.&lt;br&gt;
Cosine similarity, which we used for all retrievals, is theoretically scale-invariant—but in practice, &lt;strong&gt;index-level normalization matters&lt;/strong&gt;, especially when mixed with filtered + hybrid queries.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons Learned (or Re-Learned)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Always normalize before insert. Always.
&lt;/h3&gt;

&lt;p&gt;I had assumed that the downstream ingestion code was already &lt;code&gt;l2-normalizing&lt;/code&gt; the embeddings. It wasn’t. And even though cosine distance is supposed to ignore magnitude, many ANN libraries (including Faiss and Milvus’s HNSW) &lt;strong&gt;use raw dot product internally&lt;/strong&gt; and normalize at query time only.&lt;/p&gt;

&lt;p&gt;Result? Insert-time magnitude variance = weird scoring behavior.&lt;/p&gt;

&lt;p&gt;Fix: added &lt;code&gt;embedding = embedding / np.linalg.norm(embedding)&lt;/code&gt; before inserts. Immediately improved recall by ~15%.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Vector DBs don’t protect you from messy upstream models
&lt;/h3&gt;

&lt;p&gt;No matter how good your vector database is, it doesn’t validate the statistical properties of your data. If your embedding distribution drifts (like ours did after a model retrain), the index won’t scream at you. It’ll just… get worse.&lt;/p&gt;

&lt;p&gt;In this case, the new lidar encoder was producing vectors on a much smaller scale. Nothing broke, but everything degraded.&lt;/p&gt;

&lt;p&gt;Takeaway: embedding stats should be part of CI. Track means, norms, sparsity, drift. It’s cheap and saves hours later.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Metadata filters can mask retrieval bugs
&lt;/h3&gt;

&lt;p&gt;When recall dropped, our re-ranking + metadata filtering kept returning "reasonable" results, which made debugging harder. The top-3 looked OK—until we noticed they were all from the same location tag.&lt;/p&gt;

&lt;p&gt;Moral: if you're using metadata filters (which you should), &lt;strong&gt;test recall both with and without filters&lt;/strong&gt;. Otherwise, you’re debugging the wrong component.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Notes
&lt;/h2&gt;

&lt;p&gt;No, this wasn’t a massive failure. It was one of those slow, silent bugs that creep into production pipelines when different teams train models, build retrievers, and wire up search logic. Nothing crashed—but the user experience got worse.&lt;/p&gt;

&lt;p&gt;I’m sharing this mostly to remind myself (and maybe you) that &lt;strong&gt;ANN infrastructure is only as good as the vectors you feed it&lt;/strong&gt;. And the most boring parts—like normalization—still bite you the hardest.&lt;/p&gt;




&lt;p&gt;If you’ve run into similar issues with mixed-modality embeddings or have better ways to track embedding drift, I’m all ears. Thinking of adding some lightweight checksums or vector histograms to our monitoring pipeline next.&lt;/p&gt;




</description>
    </item>
    <item>
      <title>Evaluating Vector Databases in Real Systems</title>
      <dc:creator>Andrew Kennon</dc:creator>
      <pubDate>Thu, 03 Jul 2025 09:14:24 +0000</pubDate>
      <link>https://dev.to/andrewkennon/evaluating-vector-databases-in-real-systems-p1i</link>
      <guid>https://dev.to/andrewkennon/evaluating-vector-databases-in-real-systems-p1i</guid>
      <description>&lt;p&gt;Over the past couple years, I’ve integrated vector databases into several autonomous driving and LLM-based perception workflows — think multimodal RAG pipelines, scene retrieval across lidar+camera streams, or sensor signature matching. These workloads aren’t your average chatbot demos; they demand high recall, stable latency, and the ability to filter by time, location, or sensor type.&lt;/p&gt;

&lt;p&gt;So I’ve been watching the vector DB landscape pretty closely. Below is a benchmark-grounded evaluation of what’s out there, with some firsthand perspective on what actually works when you’re pushing real data through these systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Evaluate (And Why It Matters)
&lt;/h2&gt;

&lt;p&gt;I care about five things when selecting a vector DB for production:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Query performance&lt;/strong&gt; — Low latency matters, especially when you’re feeding LLMs in real time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recall&lt;/strong&gt; — In robotics, wrong retrievals mean wrong plans. I need to trust the top-k.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Insert + index time&lt;/strong&gt; — If I’m syncing sensor data every few seconds, I don’t want indexing to be the bottleneck.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt; — Millions of vectors from real-world scenes pile up fast.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Functionality&lt;/strong&gt; — Filtering, hybrid queries, stability under load — all must-haves.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I use results from &lt;strong&gt;ANN-Benchmark&lt;/strong&gt; and &lt;strong&gt;VectorDBBench&lt;/strong&gt;, with embeddings ranging from 960 to 1536 dimensions — roughly what you’d expect from vision transformers or OpenAI’s text models.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;a href="https://zilliz.com/" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt; / Milvus
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What I’ve Seen
&lt;/h3&gt;

&lt;p&gt;This one consistently performs well on both recall and throughput, especially with disk-based indexing and hybrid search. I’ve used Milvus in a lidar-tagged object retrieval task (think “find similar scenes where the ego vehicle was overtaken”) — and the filtering capabilities really helped.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strengths
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;High QPS and recall in real-world tests (VectorDBBench)&lt;/li&gt;
&lt;li&gt;Supports hybrid queries (e.g., timestamp &amp;gt; t &amp;amp;&amp;amp; similarity &amp;gt; x)&lt;/li&gt;
&lt;li&gt;Scales well for large workloads&lt;/li&gt;
&lt;li&gt;Milvus (open-source) for flexibility, Zilliz (cloud) for lower ops&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Self-hosted setup is non-trivial (you’re running Pulsar and etcd)&lt;/li&gt;
&lt;li&gt;Zilliz Cloud simplifies deployment but limits deep tuning (e.g., IVF params)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;a href="https://weaviate.io/" rel="noopener noreferrer"&gt;Weaviate&lt;/a&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What I’ve Seen
&lt;/h3&gt;

&lt;p&gt;I tested Weaviate in a cross-modal search prototype — matching dashcam clips to driving logs using metadata filters. It handled hybrid queries well, though indexing took longer than I expected during rapid ingestion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strengths
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Good recall/QPS balance in most benchmarks&lt;/li&gt;
&lt;li&gt;First-class support for metadata filtering and multimodal modules&lt;/li&gt;
&lt;li&gt;Friendly APIs (GraphQL, REST), quick to start&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Index construction slower on large or dynamic datasets&lt;/li&gt;
&lt;li&gt;Memory usage can spike during concurrent reads&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;a href="https://www.pinecone.io/" rel="noopener noreferrer"&gt;Pinecone&lt;/a&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What I’ve Seen
&lt;/h3&gt;

&lt;p&gt;Pinecone feels like a SaaS-native solution. I tried it in a project where fast prototyping mattered more than custom indexing. It worked — but I hit walls when I wanted to tune performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strengths
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Easy to deploy, zero ops&lt;/li&gt;
&lt;li&gt;Solid QPS under moderate load&lt;/li&gt;
&lt;li&gt;Strong ecosystem integration (LangChain, OpenAI)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Indexing parameters not exposed — you get what you get&lt;/li&gt;
&lt;li&gt;Recall underperforms slightly on larger or high-dimensional datasets&lt;/li&gt;
&lt;li&gt;Cost becomes a concern once you scale past toy projects&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;a href="https://qdrant.tech/" rel="noopener noreferrer"&gt;Qdrant&lt;/a&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What I’ve Seen
&lt;/h3&gt;

&lt;p&gt;This one surprised me. I ran a batch object similarity task on CPU (no GPU) and it held up pretty well. Still rough around the edges on some features, but promising for edge use or CPU-bound systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strengths
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Efficient insert/search on commodity hardware (Rust backend)&lt;/li&gt;
&lt;li&gt;REST/gRPC APIs, filtering supported&lt;/li&gt;
&lt;li&gt;Lightweight and open-source&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Limited support for hybrid search with complex schemas&lt;/li&gt;
&lt;li&gt;Ecosystem not as mature as Milvus or Weaviate (yet)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  FAISS
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What I’ve Seen
&lt;/h3&gt;

&lt;p&gt;I still use FAISS when I want to test indexing strategies in isolation. But I wouldn’t use it in a full production loop — no filtering, no persistence, no service layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strengths
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Excellent for raw ANN algorithm comparisons&lt;/li&gt;
&lt;li&gt;GPU support, fast brute-force testing&lt;/li&gt;
&lt;li&gt;Customizable index combinations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Not a database: no filters, no auth, no scaling&lt;/li&gt;
&lt;li&gt;Can’t be used standalone in production workflows&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Chroma
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What I’ve Seen
&lt;/h3&gt;

&lt;p&gt;Nice for LangChain demos. I once used it in a hackathon to build a doc-based LLM assistant. Fast to start, but hit scaling limits quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strengths
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Minimal setup, beginner-friendly&lt;/li&gt;
&lt;li&gt;Works well for prototypes or one-off use&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Weak on indexing and recall with larger datasets&lt;/li&gt;
&lt;li&gt;Missing core DB features like hybrid filters and distributed execution&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Vector DB&lt;/th&gt;
&lt;th&gt;Recall&lt;/th&gt;
&lt;th&gt;QPS&lt;/th&gt;
&lt;th&gt;Indexing&lt;/th&gt;
&lt;th&gt;Hybrid Search&lt;/th&gt;
&lt;th&gt;Deployment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Milvus / Zilliz&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;td&gt;OSS + Cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weaviate&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;td&gt;OSS + Cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pinecone&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Cloud only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qdrant&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;OSS + Cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FAISS&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Varies&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Not supported&lt;/td&gt;
&lt;td&gt;Local only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chroma&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Simple&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Local / Prototypes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Final Notes
&lt;/h2&gt;

&lt;p&gt;For robotics, perception, or any RAG-heavy pipeline, my current picks look like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Milvus/Zilliz&lt;/strong&gt; if I need indexing performance + hybrid filtering at scale&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weaviate&lt;/strong&gt; if schema flexibility and metadata filtering are key&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qdrant&lt;/strong&gt; if I’m deploying on the edge or working CPU-only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pinecone&lt;/strong&gt; if I want managed infrastructure and don’t mind tradeoffs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That said, nothing beats testing with your own embeddings and real query patterns. Benchmarks help, but your workload always tells the truth.&lt;/p&gt;

&lt;p&gt;Let me know if you want a breakdown by use case — like fraud detection vs. vision search vs. conversational RAG. I’ve tested across a few and the performance shifts depending on the shape of your vectors and latency needs.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Autonomous Driving Tech: Who’s Actually Winning in 2025?</title>
      <dc:creator>Andrew Kennon</dc:creator>
      <pubDate>Mon, 30 Jun 2025 09:19:51 +0000</pubDate>
      <link>https://dev.to/andrewkennon/autonomous-driving-tech-whos-actually-winning-in-2025-4ch</link>
      <guid>https://dev.to/andrewkennon/autonomous-driving-tech-whos-actually-winning-in-2025-4ch</guid>
      <description>&lt;h1&gt;
  
  
  2025 Autonomous Driving Leaderboard — A Practical, Technical Ranking
&lt;/h1&gt;

&lt;p&gt;I’ve spent the past few years deep in the weeds of AI system integration for autonomous vehicles — mostly working on sensor fusion and neural planning stacks. And while the headlines keep swinging between “self-driving is dead” and “AI will solve it all,” the truth is way more nuanced.&lt;/p&gt;

&lt;p&gt;So here’s a real ranking — not based on hype or stock price, but on what’s actually deployed, how the tech works, and how well it scales.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I’m Ranking — and Why
&lt;/h2&gt;

&lt;p&gt;Forget “Is it Level 4 or 5?” These are the five technical criteria that actually matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Perception&lt;/strong&gt;: Sensor fusion, occlusion handling, extreme conditions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decision &amp;amp; Control&lt;/strong&gt;: Driving policy intelligence, smoothness, human-likeness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System Architecture&lt;/strong&gt;: Rule-based vs. end-to-end; data flywheel maturity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational Scale&lt;/strong&gt;: Real-world deployment footprint — not just test demos.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt;: Can it generalize to new cities, new cars, new situations?&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  2025 Leaderboard (Narrative Style)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  #1 — Waymo
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Gold standard in safety, smoothness, and robustness.&lt;/li&gt;
&lt;li&gt;Mature sensor fusion, great occlusion handling.&lt;/li&gt;
&lt;li&gt;Slow, expensive expansion is their biggest weakness.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  #2 — Tesla FSD v12+
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Pure end-to-end transformer stack — no lidar, no HD maps.&lt;/li&gt;
&lt;li&gt;Unmatched improvement rate due to fleet-scale data.&lt;/li&gt;
&lt;li&gt;Still brittle with weird edge cases, pedestrians, and turns.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  #3 — Cruise (Post-Reset)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Strong planning stack, especially in dense urban areas.&lt;/li&gt;
&lt;li&gt;Setback after 2023 incident, public trust/reputation damaged.&lt;/li&gt;
&lt;li&gt;Rebuilding mode, but core tech still solid.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  #4 — XPeng XNGP
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Strong BEV-based perception and memory-style planning.&lt;/li&gt;
&lt;li&gt;OTA updates frequent; impressive highway+city integration.&lt;/li&gt;
&lt;li&gt;Still too rule-heavy and less robust in unmapped zones.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  #5 — Huawei ADS 2.0
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;"Too engineered" — great in well-mapped areas.&lt;/li&gt;
&lt;li&gt;Relies heavily on lidar + HD maps.&lt;/li&gt;
&lt;li&gt;Lacks flexibility outside coverage zones.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  #6 — Baidu Apollo Go
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Cost-efficient, city-scaled robotaxi service.&lt;/li&gt;
&lt;li&gt;Rule-based, HD map-heavy planning.&lt;/li&gt;
&lt;li&gt;Less adaptable than Tesla/XPeng in novel situations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  #7 — Mobileye SuperVision
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;More ADAS than AV, but worth mentioning.&lt;/li&gt;
&lt;li&gt;Plug-and-play scale with global OEMs.&lt;/li&gt;
&lt;li&gt;Perception stack is world-class; autonomy is limited.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Who’s Actually Doing End-to-End Neural Driving?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Company&lt;/th&gt;
&lt;th&gt;Planning Type&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tesla&lt;/td&gt;
&lt;td&gt;End-to-end transformer&lt;/td&gt;
&lt;td&gt;Outputs control tokens directly from video + vehicle state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wayve&lt;/td&gt;
&lt;td&gt;End-to-end + LLM&lt;/td&gt;
&lt;td&gt;Explains decisions in natural language&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Others&lt;/td&gt;
&lt;td&gt;Classical stack&lt;/td&gt;
&lt;td&gt;Perception → planning → control&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Personal Testing Notes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tesla FSD v12.3.6 (Bay Area)&lt;/strong&gt;: Smooth suburban driving, but struggles with weird U-turns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Waymo (SF)&lt;/strong&gt;: Still the smoothest and most confident rides.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;XPeng G9 (Guangzhou)&lt;/strong&gt;: Great in mapped zones; fragile in new areas.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cruise (Austin, pre-incident)&lt;/strong&gt;: Polished, but sometimes too cautious (e.g., freezes at crosswalks).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Where This Is Headed (2025–2026 Bets)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multimodal BEV + LLM Fusion&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Spatial reasoning + language-based policy → more explainable driving logic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Closed-Loop Training Pipelines&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Simulation + auto-labeling at fleet scale. Tesla is miles ahead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Zero-Map Urban Generalization&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Whoever nails robust, map-free city driving wins the long game.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Instead of asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How many miles until takeover?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We should be asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Can this system outperform average human drivers in daily driving — and fail gracefully when it can’t?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  TL;DR
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Waymo = Safest and most refined
&lt;/li&gt;
&lt;li&gt;Tesla = Boldest and fastest evolving
&lt;/li&gt;
&lt;li&gt;XPeng/Huawei/Baidu = Scaling fast in China, each with unique trade-offs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’ve tested these systems or want a deeper dive into any specific stack — like LLM planners, BEV fusion, or how Tesla tokenizes control — let me know. Always down to go deeper.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
