DEV Community: TildAlice

Mamba-2 vs Mamba vs Transformer: Long Range Arena Results

TildAlice — Sat, 23 May 2026 15:04:53 +0000

The Promise vs the Reality

Mamba-2 claims to fix Mamba's hardware inefficiency while keeping its linear-time magic. The original paper shows impressive throughput numbers — 2-8x faster training than Mamba, competitive with Transformers on A100s. But I wanted to see if that speed came at an accuracy cost, especially on tasks where long-range dependencies actually matter.

Long Range Arena (LRA) is the benchmark everyone uses to prove their architecture "handles long sequences better." It's a suite of tasks (ListOps, text classification, image classification, pathfinder) designed to stress-test models on sequences up to 16K tokens. If you're going to claim you beat Transformers at long context, you need to show LRA numbers.

Here's what I found: Mamba-2 doesn't just match Mamba's accuracy — it actually improves on several LRA tasks while being substantially faster. But there's a catch the paper downplays.

Photo by Chris F on Pexels

What Changed from Mamba to Mamba-2

Continue reading the full article on TildAlice

Why I Switched from Ledger to Open Source Hardware Wallets

TildAlice — Sat, 23 May 2026 00:02:35 +0000

Credit-card sized, four physical buttons, a small but crisp OLED display.

I used a Ledger Nano S for three years before switching. Not because it stopped working—it didn't. I switched because I finally understood what "don't trust, verify" actually means in hardware security.

The Ledger Problem: Closed Source Means Blind Trust

Ledger's firmware is closed source. You can't see the code running on your device. When they released Ledger Recover in 2023—a feature that splits your seed phrase into encrypted shards and sends them to third parties—the community lost it. Not because key sharding is inherently bad, but because it proved the firmware could extract and transmit your private keys.

Ledger's response was essentially "trust us, we won't enable this without your consent." But that's exactly the problem. With closed source firmware, you have no way to verify that claim. You're trusting the company, not the math.

Continue reading the full article on TildAlice

LoRA vs Adapter vs Prefix Tuning: PEFT Memory Comparison

TildAlice — Fri, 22 May 2026 21:05:39 +0000

Why Full Fine-Tuning Became Unaffordable

Fine-tuning GPT-3 175B requires updating 175 billion parameters. That's 700GB of optimizer states alone (Adam needs 2 copies per parameter). Most teams can't afford that.

Parameter-Efficient Fine-Tuning (PEFT) methods solve this by freezing the base model and training a tiny subset of parameters. LoRA, Adapter layers, and Prefix Tuning are the three most cited approaches. They all claim "competitive performance with <1% trainable parameters," but they achieve it in completely different ways.

This post compares the three methods mechanically: where the new parameters live, what the forward pass looks like, and which one actually saves you money on your next fine-tuning job. You can read the original LoRA paper here, Adapters from Houlsby et al. (2019), and Prefix Tuning from Li and Liang (2021).

Photo by Castorly Stock on Pexels

LoRA: Low-Rank Decomposition of Weight Updates

Continue reading the full article on TildAlice

Sorting Algorithm Speed: 100 to 10K Elements Benchmark

TildAlice — Fri, 22 May 2026 18:04:40 +0000

When Interview-Size Arrays Change Everything

Most sorting benchmarks test millions of elements. That's not your interview. You're sorting maybe 1,000 integers while the interviewer watches. So does algorithm choice actually matter at this scale?

I ran the numbers. At 1,000 elements, the gap between Python's built-in sorted() and a hand-rolled quicksort isn't 10x — it's 47x. But here's where it gets weird: at 100 elements, bubble sort sometimes beats quicksort. Not in theory. In actual microseconds on real hardware.

This matters because interviewers occasionally ask you to implement sorting from scratch. And the obvious follow-up is "what's the complexity?" But they rarely ask "what's the actual runtime?" Understanding both makes you sound like you've actually written production code.

The Benchmark Setup

I tested seven sorting algorithms on arrays of 100, 500, 1,000, 5,000, and 10,000 random integers. Each test ran 100 times with fresh random data. Python 3.11 on an M1 MacBook Air, time.perf_counter_ns() for nanosecond precision.


python
import random
import time
from typing import List, Callable
import heapq

def benchmark_sort(sort_fn: Callable, arr_sizes: List[int], runs: int = 100):
    """Returns {size: median_time_ns} for each array size."""
    results = {}
    for n in arr_sizes:
        times = []
        for _ in range(runs):
            arr = [random.randint(0, 10_000) for _ in range(n)]
            test_arr = arr.copy()

---

*Continue reading the full article on [TildAlice](https://tildalice.io/python-sorting-runtime-benchmark-interview-arrays/)*

Python Type Hints: runtime vs typing-only Annotations

TildAlice — Thu, 21 May 2026 21:04:15 +0000

Most Python Type Hints Do Absolutely Nothing at Runtime

You add type hints to your Python code, run it, and... nothing happens. No validation. No errors. The annotations just sit there. But some annotations—like dataclasses, Pydantic models, or typing.NewType—actually change how your code behaves. The difference isn't obvious until you hit a production bug that could've been caught if you knew which hints matter when.

The core distinction: typing-only annotations exist purely for static analysis tools like mypy or pyright. Runtime annotations are inspected by libraries or decorators during execution to enforce validation, generate code, or alter behavior. Most developers treat all type hints the same way, then wonder why Pydantic validators catch bugs that mypy missed—or vice versa.

This post runs the same code with different annotation styles, shows what actually happens at runtime, and benchmarks the cost of runtime validation. You'll see real tracebacks, performance numbers from Python 3.11 and 3.12, and the edge cases where mixing both approaches breaks your code.

Continue reading the full article on TildAlice

Gymnasium Custom Env Step() Returns Invalid Shape: 5 Fixes

TildAlice — Thu, 21 May 2026 18:05:11 +0000

The Error That Breaks Every Custom Gymnasium Environment

Your step() function returns a perfectly valid numpy array. Stable Baselines3 throws AssertionError: The observation returned by thestep()method does not match the given observation space. You stare at your code for 20 minutes, print the shape, it looks correct, and yet the environment refuses to work.

This specific error has cost me more debugging hours than any algorithm hyperparameter ever did. The shape mismatch between observation_space and actual returned observations is the single most common bug when building custom Gymnasium environments, and it fails silently in ways that will make you question your sanity.

Here's the working pattern I now use for every custom environment:


python
import gymnasium as gym
import numpy as np
from gymnasium import spaces

class RobustCustomEnv(gym.Env):
    def __init__(self):
        super().__init__()
        self.state_dim = 4

        # Bug fix #1: Always use explicit dtype
        self.observation_space = spaces.Box(
            low=-np.inf,
            high=np.inf,
            shape=(self.state_dim,),
            dtype=np.float32  # CRITICAL: explicit dtype
        )
        self.action_space = spaces.Discrete(2)
        self._state = None

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        # Bug fix #2: Cast to correct dtype immediately
        self._state = np.zeros(self.state_dim, dtype=np.float32)

---

*Continue reading the full article on [TildAlice](https://tildalice.io/gymnasium-custom-env-observation-space-bugs/)*

Pydantic 40x Validation Overhead: When Type Hints Break Performance

TildAlice — Thu, 21 May 2026 15:05:09 +0000

The 40x Slowdown Nobody Warned Me About

Pydantic validation added 847ms to a function that should have taken 21ms. That's not a typo—a simple data transformation went from "instant" to "why is the user staring at a loading spinner?" The culprit wasn't network latency or database queries. It was type validation on 50,000 dictionary objects.

Python type hints have zero runtime cost by design. PEP 484 explicitly states that type annotations should not affect program semantics. But Pydantic, attrs with validators, and beartype don't just annotate—they actively check every field on every object instantiation. When you're processing batch data or handling high-throughput APIs, that "safe" validation becomes a performance tax you didn't budget for.

Photo by Christina Morillo on Pexels

Why Type Hints Themselves Cost Nothing (But Validation Does)

Let's establish the baseline. Python's native type hints are stored in __annotations__ and completely ignored at runtime:


python
import timeit

def process_untyped(x, y, z):
    return x + y + z

---

*Continue reading the full article on [TildAlice](https://tildalice.io/pydantic-40x-validation-overhead-performance/)*

YOLO vs Detectron2 vs MMDetection: Training Speed Test

TildAlice — Wed, 20 May 2026 21:04:52 +0000

MMDetection has the highest learning curve I've encountered in object detection frameworks. But it's also the only one I'd trust for a 50-class custom dataset.

Most tutorials will tell you to start with YOLO because it's "easy." They're not wrong—I had a YOLOv8 model running on a custom hardhat detection dataset in under 30 minutes. But three weeks later, when I needed to swap in Cascade R-CNN because single-stage detectors were missing small objects at distance, I was stuck rewriting everything. The YOLO ecosystem is optimized for speed and convenience, not flexibility.

I spent a month training the same construction site safety detection task (5 classes: hardhat, no-hardhat, vest, machinery, person) across all three frameworks. Same dataset, same train/val split, same Mosaic augmentation scheme. The differences weren't just in mAP—they showed up in training time, debugging pain, and how easily I could swap architectures when the first attempt failed.

Here's what actually matters when you're choosing a framework for custom detection work.

Continue reading the full article on TildAlice

Quick Sort vs Merge Sort vs Heap Sort: Python Speed Test

TildAlice — Wed, 20 May 2026 18:04:03 +0000

Quick Sort vs Merge Sort vs Heap Sort: Python Speed Test on Coding Interview Arrays

People love telling you Quick Sort is $O(n \log n)$ average case and Merge Sort is stable. What they don't mention is that Quick Sort beats Merge Sort by 40% on random arrays under 10,000 elements, but Merge Sort wins on nearly-sorted data. And Heap Sort? It's consistently slower than both, even though the complexity looks identical.

I ran all three on the exact array patterns you see in coding interviews — random, sorted, reverse-sorted, and "mostly sorted with a few swaps." The results weren't what I expected.

Why This Even Matters in Interviews

Most interview questions don't ask you to implement sorting from scratch. But here's the thing: understanding why one sort beats another teaches you cache locality, pivot selection, and the gap between theoretical complexity and real-world performance. When you're optimizing a solution and the interviewer asks "can we do better?", knowing that switching from a stability-preserving sort to an in-place one can save you 30% runtime is the kind of insight that moves you from "correct answer" to "strong hire."

Plus, some interviewers absolutely will ask you to implement Quick Sort or Merge Sort. And if you can't explain why Quick Sort degenerates to $O(n^2)$ on sorted input without randomization, you're going to have a bad time.

The Three Contenders

Let's establish what we're comparing.

Continue reading the full article on TildAlice

ROS1 move_base vs ROS2 Nav2: 8 Breaking Changes

TildAlice — Wed, 20 May 2026 15:05:28 +0000

The Navigation Stack Rewrite Nobody Warned You About

ROS2 Nav2 is not a drop-in replacement for ROS1 move_base. It's a complete architectural redesign that will break your production robot if you treat it like a version bump.

I learned this the hard way migrating a warehouse AMR fleet from ROS1 Noetic to ROS2 Humble. The navigation stack worked fine in simulation, passed all integration tests, then immediately crashed on the factory floor when the first robot tried to recover from a blocked path. The recovery behavior API had been completely rewritten, and our custom recovery plugins were silently ignored.

This post walks through the 8 breaking changes that actually matter in production — not the ones mentioned in migration guides, but the ones that only surface when your robot is live.

Photo by David Thái on Pexels

Parameter Namespaces: bt_navigator vs move_base

ROS1 move_base used flat parameter namespaces. You configured everything under /move_base/:


yaml
# ROS1 move_base
move_base:
  controller_frequency: 10.0

---

*Continue reading the full article on [TildAlice](https://tildalice.io/ros1-movebase-vs-ros2-nav2-migration-checklist/)*

ROS2 Topic & Service Commands: 15 CLI Tools Tested

TildAlice — Tue, 19 May 2026 21:04:24 +0000

The First Thing You Need After `ros2 run`

You just launched your first ROS2 node. It's running. Great. Now what?

Most tutorials stop here, leaving you staring at a terminal with no idea what your node is actually doing. Is it publishing? At what rate? What's the message structure? This is where the ROS2 command-line tools become essential — not just for debugging, but for understanding what's happening in your system at all.

I'm going to show you 15 commands that turn ROS2 from a black box into something you can actually inspect and control. These aren't just reference material — they're the tools I reach for every single time something doesn't work the way I expect.

Photo by Pavel Danilyuk on Pexels

`ros2 topic list`: See Every Active Topic

Start here:

ros2 topic list

This dumps every active topic in your ROS2 network. On a fresh turtlesim node (install with sudo apt install ros-humble-turtlesim if you're following along), you'll see:

/parameter_events
/rosout
/turtle1/cmd_vel
/turtle1/color_sensor
/turtle1/pose

Continue reading the full article on TildAlice

Pinecone vs Qdrant vs Weaviate: RAG Query Speed at 1M Vectors

TildAlice — Tue, 19 May 2026 18:04:41 +0000

Pinecone's Managed Simplicity Comes at 3.2x the Latency Cost

I'll say it upfront: if you're building RAG and care about p95 latency, Qdrant beats Pinecone by 3.2x on identical queries. Weaviate sits somewhere in the middle, 1.8x faster than Pinecone but trailing Qdrant.

This isn't a toy benchmark. I loaded 1 million 1536-dimensional vectors (OpenAI text-embedding-3-small embeddings) into all three, fired 1000 queries with $k=10$ retrieval, and measured latency under realistic load. The results surprised me—not because Qdrant won, but because the gap was this wide even on managed instances.

Most tutorials pick Pinecone by default. It's the safe choice, the one VCs recognize. But that safety costs you 180ms per query at median, 420ms at p95. For conversational RAG where users expect sub-second responses, that's half your latency budget gone before you even call the LLM.

Photo by Raymond Eichelberger on Pexels

The Setup: 1M Vectors, 3 Managed Instances

Continue reading the full article on TildAlice

DEV Community: TildAlice

Mamba-2 vs Mamba vs Transformer: Long Range Arena Results

The Promise vs the Reality

What Changed from Mamba to Mamba-2

Why I Switched from Ledger to Open Source Hardware Wallets

The Ledger Problem: Closed Source Means Blind Trust

LoRA vs Adapter vs Prefix Tuning: PEFT Memory Comparison

Why Full Fine-Tuning Became Unaffordable

LoRA: Low-Rank Decomposition of Weight Updates

Sorting Algorithm Speed: 100 to 10K Elements Benchmark

When Interview-Size Arrays Change Everything

The Benchmark Setup

Python Type Hints: runtime vs typing-only Annotations

Most Python Type Hints Do Absolutely Nothing at Runtime

Gymnasium Custom Env Step() Returns Invalid Shape: 5 Fixes

The Error That Breaks Every Custom Gymnasium Environment

Pydantic 40x Validation Overhead: When Type Hints Break Performance

The 40x Slowdown Nobody Warned Me About

Why Type Hints Themselves Cost Nothing (But Validation Does)

YOLO vs Detectron2 vs MMDetection: Training Speed Test

MMDetection has the highest learning curve I've encountered in object detection frameworks. But it's also the only one I'd trust for a 50-class custom dataset.

Quick Sort vs Merge Sort vs Heap Sort: Python Speed Test

Quick Sort vs Merge Sort vs Heap Sort: Python Speed Test on Coding Interview Arrays

Why This Even Matters in Interviews

The Three Contenders

ROS1 move_base vs ROS2 Nav2: 8 Breaking Changes

The Navigation Stack Rewrite Nobody Warned You About

Parameter Namespaces: bt_navigator vs move_base

ROS2 Topic & Service Commands: 15 CLI Tools Tested

The First Thing You Need After ros2 run

ros2 topic list: See Every Active Topic

Pinecone vs Qdrant vs Weaviate: RAG Query Speed at 1M Vectors

Pinecone's Managed Simplicity Comes at 3.2x the Latency Cost

The Setup: 1M Vectors, 3 Managed Instances

The First Thing You Need After `ros2 run`

`ros2 topic list`: See Every Active Topic