DEV Community: Ian Cowley

Data-Oriented Design in C#: Why Objects Are Slowing You Down

Ian Cowley — Tue, 23 Jun 2026 12:54:16 +0000

Data-Oriented Design in C#: Why Objects Are Slowing You Down

In my previous article, we talked about starving the Garbage Collector by moving away from heap-allocated class types and leaning heavily into struct, Span<T>, and ArrayPool<T>.

That’s a critical first step, but it only solves half the problem. You’ve stopped the GC from pausing your app, but you might still be leaving massive amounts of CPU performance on the table. Why? Because of how your data is structured.

It’s time to talk about Data-Oriented Design (DoD).

The Object-Oriented Trap

We are taught from day one to model our code after the real world. If you are building a social network graph, you might write something like this:

public class UserNode
{
    public int Id { get; set; }
    public string Name { get; set; }
    public List<Edge> Connections { get; set; }
}

public class Edge
{
    public UserNode Target { get; set; }
    public int Weight { get; set; }
}

This makes perfect logical sense. A user has connections, and those connections point to other users.

But modern CPUs don't care about your logical models. A CPU only cares about reading data from memory into its L1/L2 caches as fast as possible. When a CPU reads a byte from RAM, it doesn't just read that one byte; it pulls a whole 64-byte "cache line" under the assumption that you will probably want the neighboring bytes next.

When you loop through a List<UserNode>, traversing from object to object, you are jumping randomly across the heap. The CPU pulls a cache line, reads your data, and then has to go fetch a completely different block of RAM for the next node. This is called pointer chasing, and the resulting cache misses are devastating to performance.

Enter Data-Oriented Design: Struct of Arrays (SoA)

Data-Oriented Design says: Stop modeling the real world. Model the data the way the hardware wants to consume it.

Instead of an Array of Structs (AoS) (or an array of objects), we invert the architecture to a Struct of Arrays (SoA).

If we look at how the native DataFrame engine Glacier.Polaris or the graph engine Glacier.Graph operates, there are no Node or Edge classes. Instead, we use flat, primitive arrays.

To represent a graph, Glacier.Graph uses the Compressed Sparse Row (CSR) format. The entire graph structure is flattened into a few dense integer arrays:

public class CsrGraph
{
    // The index in the _to array where a node's edges begin
    private readonly int[] _head; 

    // The target node IDs
    private readonly int[] _to;   

    // The relationship types or weights
    private readonly int[] _relation; 
}

The Cache-Friendly Loop

Let's say we want to find all connections for Node 5. In the OOP world, we follow pointers on the heap. In the CSR world, we do this:

int startEdgeIndex = _head[5];
int endEdgeIndex = _head[6];

// Look at how perfectly sequential this is!
for (int i = startEdgeIndex; i < endEdgeIndex; i++)
{
    int targetNode = _to[i];
    int relationWeight = _relation[i];

    // Process connection...
}

Why is this so much faster?
Because _to and _relation are dense, contiguous int arrays. As we loop through i, the CPU's pre-fetcher easily predicts our access pattern. It loads the cache lines ahead of time. By the time our loop needs _to[i+1], it is already sitting in the blazing fast L1 cache.

The Secret Weapon: SIMD

But cache lines are only the beginning. Once your data is sitting in a flat primitive array, you unlock the real superpower of modern .NET: SIMD (Single Instruction, Multiple Data).

You cannot pass an array of UserNode objects into an AVX-512 vector register to evaluate them simultaneously. But you can load 8 consecutive integers from that _relation array into a Vector256<int> and evaluate them all in a single CPU clock cycle.

When you align your memory this way, C# allows you to drop right down to the metal. This is exactly how Glacier.Chrono hits over 2 Billion operations per second without breaking a sweat.

The Bottom Line

Object-Oriented Programming is fantastic for UI components and high-level business logic. But when you drop down into the engine room—when you are building dataframes, graph databases, or processing millions of records a second—you have to think like the CPU.

Flatten your objects. Separate your properties into contiguous arrays. Design for cache hits, and your code will run orders of magnitude faster.

Replacing Electron with .NET 10: Writing a zero-latency, SIMD-accelerated IDE

Ian Cowley — Sat, 20 Jun 2026 17:28:18 +0000

Modern development tools have normalized lag. As an industry, we’ve somehow accepted that opening a text editor should consume 2GB of RAM, and that typing rapidly should occasionally stutter while an extension parses a JSON tree on the UI thread.

The standard playbook for building cross-platform editors today is to use Web Technologies (Electron/Tauri) or to drag along a decades-old monolithic C++ architecture.

But when it comes to high-performance, memory-intensive data processing, C# and .NET 10 are absolute bare-metal contenders. We don't need to default to browser engines or Rust to get sub-millisecond startup times and zero-allocation execution.

To prove it, I wrote SpanCoder: a cutting-edge, extensibility-first IDE built entirely on .NET 10 and Avalonia UI. It features a custom Skia-rendered canvas, a zero-allocation piece-table text buffer, hardware-accelerated SIMD parsing, and a heavily decoupled process architecture.

Here is how you build a premium, out-of-process IDE with C# without waking up the Garbage Collector.

1. The Death of the Monolith: Out-of-Process Resiliency

If a single third-party extension crashes, your IDE should not freeze.

Instead of a monolithic architecture where a heavy UI thread handles rendering, buffer parsing, and extensions, SpanCoder isolates everything into dedicated processes communicating over low-latency binary-serialized TCP channels and standard Stdio redirects:

SpanCoder.App (Shell UI): Handles only UI layout, custom canvas rendering, and input events on the main thread.
SpanCoder.Engine: A completely isolated background process handling the heavy text data structures and file I/O.
Language Servers (LSP) & Debuggers (DAP): Spawned entirely out-of-process via standard JSON-RPC.
Plugin Hosts: Third-party code runs in external sandboxes.

Because the UI is isolated, we maintain an Active Replay Ledger. If the background text engine crashes due to an out-of-memory error from a massive file, the UI instantly respawns it, replays the transaction ledger over the socket, and restores the exact state. You lose zero keystrokes, and the UI never locks up.

2. The Text Engine: Piece Tables and Zero Allocations

String concatenation is the enemy of performance. If you open a 2GB log file and type a single character in the middle, re-allocating that array will bring the system to its knees.

SpanCoder’s engine uses a Piece Table Buffer. Text modifications are stored as a sequence of span descriptors pointing either back to the immutable original file on disk or to a thread-safe, pre-allocated ArrayPool<char>.

Edits are O(edits) rather than O(document length).

When the custom Skia UI canvas needs to render a line, the engine checks if the segment lies contiguously in memory. If it does, it yields a ReadOnlySpan<char> directly back to the UI. The bytes go from the buffer to the screen with exactly zero heap allocations.

3. SIMD-Accelerated Line Indexing

When you insert a new line at the top of a 100,000-line document, the byte offsets for the remaining 99,999 lines all shift. In a standard linear loop, this takes time.

To fix this, SpanCoder maps absolute byte offsets using hardware intrinsics. When an edit occurs, we use .NET 10 Vector<long> from System.Numerics to shift the line array adjustments in parallel blocks. If your hardware supports AVX-512, we update 8 lines per CPU clock cycle.

C# is routinely underestimated in the data analytics and processing arena, but when you leverage memory-mapped files and SIMD instructions, it performs like a native, unmanaged language.

4. Sub-Millisecond Startup via Native AOT

Modern C# isn't just fast at runtime; it can compile down to a single, zero-dependency native executable.

However, to survive Native AOT compilation, you have to eliminate Reflection entirely. Most IDEs use reflection at startup to scan for commands, menus, and plugins, which destroys cold-start times.

SpanCoder utilizes C# Source Generators. When you tag a method with [Command] or [MenuItem], Roslyn analyzes it at compile-time and injects a static routing dictionary directly into the build.

The result? Zero reflection, complete trimmer safety, and an IDE that opens virtually the moment you click the icon.

5. Built for the Modern Workflow

While performance is the foundation, an IDE has to actually be useful. SpanCoder wraps this high-performance core in a premium developer experience:

Zero-Config Local AI (Ghost Text): Auto-detects local Ollama instances and pulls the qwen2.5-coder model in the background. It uses Fill-in-the-Middle (FIM) to render sub-100ms inline autocompletions (Ghost Text) completely offline, with zero API costs.
Embedded AI Agent (YOLO Dev): A built-in terminal and chat agent that coordinates LLM reasoning and workspace tool executions in the background engine without blocking the UI.
Hardware Debugging: Built-in graphical target managers for flashing RP2040s or STM32s, complete with an embedded PTY terminal and Microcontroller DAP stepping.
Real-Time Collaboration: Synchronizes editor buffers between developers using a low-latency Conflict-free Replicated Data Type (CRDT) protocol over WebSockets.

The Takeaway

It is incredibly easy to look at the current software landscape and assume that everything has to be built on web technologies to be cross-platform, or built in C++/Rust to be fast.

SpanCoder is proof that .NET 10, combined with Avalonia UI and a healthy dose of mechanical sympathy, provides an incredible sweet spot. C# allows you to build safe, highly abstracted code where you want it, and terrifyingly fast, pointer-manipulating, zero-allocation data engines where you need it.

SpanCoder is currently under heavy development.

👉 Check out the architecture here: github.com/ian-cowley/SpanCoder

I built a zero-allocation C# Time-Series Engine to replace Postgres (and it hits 2 Billion values/sec)

Ian Cowley — Tue, 16 Jun 2026 13:47:14 +0000

I’ve been building systems for a while now, and if there’s one trend in modern software engineering that drives me crazy, it’s the default reaction to "we have data."

Need to log AI agent telemetry, financial ticks, or server metrics? The modern playbook says: spin up a massive Docker container, deploy PostgreSQL, install the TimescaleDB extension, configure connection pools, and pull in a heavy ORM.

TimescaleDB is an incredible piece of engineering—it bridges the gap between fast row-based ingestion and compressed columnar analytics. But why do we have to cross a network boundary, suffer IPC overhead, and serialize data just to do it?

C# and .NET 10 are absolute weapons for data analysis. We don't need to default to heavy database servers or Python for this.

So, I built Glacier.Chrono: an embedded, zero-allocation, in-process time-series database in pure C#. It mirrors the hybrid row-to-columnar architecture of TimescaleDB, implements Facebook's Gorilla compression, and hits over 2 Billion values per second with exactly zero heap allocations.

Here is how I bypassed the bloat and built it to run on the metal.

The Architecture: Row-to-Columnar Hybridity

To build a time-series engine, you have to solve two conflicting problems:

Ingestion needs to be row-based (Array of Structs) so you can blast data into memory instantly.
Analytics needs to be columnar (Struct of Arrays) so you can run SIMD instructions across continuous blocks of a single data type without thrashing the CPU cache.

Here is how Glacier.Chrono handles it:

1. The Hot Ingest (Zero-Allocation)

Data is ingested into a pre-allocated, lock-free HotRingBuffer<T>. We restrict T to unmanaged C# structs (using [StructLayout(LayoutKind.Sequential)]). Writing to the database is literally just advancing an Interlocked.Increment pointer and writing raw bytes. Multiple threads can blast telemetry at it simultaneously with zero lock contention and zero garbage collection.

2. The Pivot

When a chunk hits 10,000 rows, a background thread performs a matrix transpose. It takes the row-based data and slices it into columnar Span<T> buffers.

3. The Compression Engine

This is where the magic happens. We apply data-type-specific algorithms directly to the Span<T> buffers:

Timestamps: Delta-of-Delta (DoD). If you log every second perfectly, the DoD is 0. We pack thousands of timestamps into a handful of bits.
Floats: Facebook Gorilla XOR. We XOR consecutive floats and strip the zeros, compressing slowly changing metrics like CPU usage by 90%+.
State / Enums: Run-Length Encoding (RLE). 5,000 consecutive "Running" states become a tiny [Value: 1, Count: 5000] tuple.

The 64-Bit Accumulator Trick

To get these algorithms to scream, I couldn't use standard bit-by-bit while loops.

Instead, Glacier.Chrono uses a 64-bit CPU register accumulator trick. We accumulate bits directly into a ulong register and only write bytes to the managed memory array when the register is full. This single mechanical sympathy optimization provided a 19x speedup over standard bit-packing logic.

The Raw Numbers

I ran BenchmarkDotNet on a modern CPU with AVX-512 extensions using .NET 10. The results are frankly absurd for a managed language.

Notice the "Allocated" column.

Phase	Mean Execution	Allocated Memory	Throughput
Hot Ingest (10k rows)	138.83 μs	0 B (Steady State)	~72.0M writes/sec
Gorilla Float Compress	33.27 μs	0 B	300.5M values/sec
Delta-of-Delta Compress	4.95 μs	0 B	2.01B values/sec
SIMD Query Engine	325.96 μs	577 B (OS Handles)	30.7M records/sec

We are compressing over 2 Billion timestamps per second without touching the Garbage Collector once.

The "No-ORM" Query Engine (Powered by Source Generators)

In a traditional database, you'd send a dynamic query string like SELECT AVG(CpuTemp) FROM metrics WHERE ServerId = 1. But parsing a SQL string, building an Abstract Syntax Tree, and dynamically matching columns at runtime requires memory allocations and branching logic—which completely violates our zero-allocation philosophy.

So instead of a dynamic query engine, Glacier.Chrono uses C# Source Generators.

You define your custom telemetry schema, mark it with [ChronoTable], and specify how you want each field compressed:

[ChronoTable]  
[StructLayout(LayoutKind.Sequential, Pack = 1)]  
public struct SystemMetric : IComparable<SystemMetric>  
{  
    [Timestamp] // Delta-of-Delta  
    public long Time;

    [Metric] // Gorilla XOR  
    public float CpuTemp;

    [Category] // Run-Length Encoding  
    public int ServerId;

    public int CompareTo(SystemMetric other) => Time.CompareTo(other.Time);  
}

At compile time, Roslyn analyzes your struct and automatically emits a dedicated, zero-allocation compactor and a custom SIMD query engine directly into your project.

When you want to query the data, you use the strongly-typed, auto-generated methods. Glacier.Chrono maps only the specific columns you need directly into the OS virtual address space via MemoryMappedFile projection, ignoring the timestamp and unrelated columns completely.

// The Source Generator automatically created 'GetAverageCpuTempForServerId'  
double avgCpuTemp = SystemMetricQueryEngine.GetAverageCpuTempForServerId(  
    chunkFilePath: "./data/chunk_0.glacier",   
    targetServerId: 1,   
    queryBuffers  
);

You get a beautiful, strongly-typed developer experience with autocomplete, but under the hood, the compiler has emitted the exact, bare-metal AVX-512 vector loops for those specific columns.

The Takeaway

The data analysis and AI telemetry space has been entirely dominated by heavy database servers, Python wrappers, and massive C++ frameworks.

But .NET 10, combined with C# Source Generators, unmanaged memory, and hardware intrinsics, proves that C# is an absolute heavyweight contender. You don't need a heavy Docker-bound stack to get world-class, column-compressed time-series analytics. You just need good, native, mechanically sympathetic engineering.

Glacier.Chrono is open-source and part of the Glacier high-performance storage suite.

👉 Check out the repo here: github.com/ian-cowley/Glacier.Chrono

Starving the Garbage Collector: A Pragmatic Guide to Zero-Allocation C#

Ian Cowley — Sun, 14 Jun 2026 10:06:29 +0000

Over the last few weeks, I’ve open-sourced a suite of high-performance, zero-dependency C# engines. This includes a native DataFrame library (Glacier.Polaris), a blistering fast text searcher (Glacier.Grep), and a semantic Markdown parser for RAG contexts (Glacier.DocTree). You can find the source code for all of these on my GitHub.

A recurring question I’m getting from other devs looking at these repositories is simple: How exactly are you bypassing the Garbage Collector to get these speeds?

I’ve never hidden my distaste for heavy, magic-filled frameworks. Whether it's an unwieldy data access library or a bloated client-side framework, they all share a common flaw: they wrap your code in layers of hidden allocations that murder your CPU caches and force the .NET Garbage Collector (GC) into overdrive.

When you want to build systems that process millions of rows a second or rival native C/C++ in raw compute speed, you have to take control of your memory. To give you a fighting chance at writing your own high-performance engines, let's break down how memory allocation actually works in C#, using the architecture of the Glacier repositories as our guide.

Level 1: The Heap, the Stack, and Cache Locality

In C#, every time you use the new keyword on a class (a reference type), you are asking the runtime to find a contiguous block of free memory on the Managed Heap.

The heap is a messy place. When it gets full, the GC steps in. It pauses your application threads, traverses the object graph to see what you are still using, compacts the memory, and cleans up the garbage. For standard CRUD apps, this pause is negligible. For a DataFrame engine like Glacier.Polaris processing millions of rows, a GC pause is a catastrophic event. It's a heavy tax on your CPU cycles.

The alternative is the Stack. The stack is a tightly managed, incredibly fast area of memory assigned exclusively to the current thread. When you create a struct (a value type), it goes on the stack. When the method finishes, the stack unwinds, and the memory is instantly freed. No GC involved. Zero tax.

But dropping classes for structs isn't just about dodging the GC; it's about mechanical sympathy. Modern CPUs don't read bytes from RAM one at a time; they pull 64-byte "cache lines." By using struct and explicitly packing your data via [StructLayout(LayoutKind.Sequential)], you ensure that when the CPU grabs a cache line, it receives highly relevant, tightly packed data, drastically reducing cache misses.

The Golden Rule: If you want to go fast in a tight loop, favor struct over class.

Level 2: Slicing Without Allocating & The Async Trap

Value types are great, but what about arrays and strings? Historically, if you wanted a subset of an array or a string, you called .Substring() or .Skip().Take(). These operations allocate new objects on the heap, copying the data over.

If you look at the source for Glacier.DocTree or Glacier.Grep, you'll notice we rarely allocate new strings when reading text. Instead, we use Span<T> and ReadOnlySpan<T>.

A Span<T> is a ref struct. It is essentially a pointer to a block of memory and a length, meaning it must live on the stack. You can slice a massive buffer into smaller chunks, and it costs absolutely nothing. Zero allocations. Zero copying.

// The old, bloated way that triggers the GC
string line = "Error: Connection Timeout";
string message = line.Substring(7); // Allocates a new string on the heap

// The Glacier way (Zero-Allocation)
ReadOnlySpan<char> lineSpan = "Error: Connection Timeout".AsSpan();
ReadOnlySpan<char> messageSpan = lineSpan.Slice(7); // Just a view into memory!

The Async Trap

Because Span<T> is tied to the stack, the compiler will stop you if you try to use it across an await boundary (like asynchronously reading a file stream). State machines generated by async/await cannot preserve stack-only references.

To bridge this gap, we use Memory<T>. Memory<T> can safely live on the heap and travel through async pipelines. Once the I/O operation yields and you are ready to do synchronous, CPU-bound processing, you simply call .Span on your Memory<T> and begin slicing at zero cost.

Level 3: Custom Allocators and the Rental Market

The stack is fast, but it's small (typically around 1MB). If you try to put a massive DataFrame column there, you will crash your app with a StackOverflowException.

In Glacier.Polaris, we are mapping primitive types directly to dense arrays to avoid the overhead of boxing. But allocating massive arrays in a tight loop with new int[100000] will trigger a Gen 0 GC collection almost instantly.

Instead of relying on standard arrays, Polaris uses custom allocators and structures like MemoryOwnerColumn and ValidityMask. This allows us to maintain C-like memory control while remaining safe within the .NET ecosystem. When we need temporary buffers, we rent them from System.Buffers.ArrayPool<T>.Shared:

// Rent an array of AT LEAST the requested size
int[] buffer = ArrayPool<int>.Shared.Rent(100000);
try 
{
    // Wrap it in a span for safe, fast access
    Span<int> workSpan = buffer.AsSpan(0, 100000);
    // Do heavy data processing...
}
finally 
{
    // Always return it! The GC never sees a new allocation.
    ArrayPool<int>.Shared.Return(buffer);
}

Level 4: Unleashing Compute (SIMD & MemoryMarshal)

Once your memory is flat, contiguous, and not bothering the Garbage Collector, you can unleash the CPU.

In Glacier.Polaris, the math isn't done row-by-row in a simple for loop. We process data in chunks using SIMD (Single Instruction, Multiple Data) CPU vector instructions.

In older .NET versions, this meant hardcoding explicit intrinsics and pinning arrays with fixed, which added slight overhead. Modern .NET abstracts this beautifully. We use MemoryMarshal.GetReference to grab a lightweight ref to our data without pinning it, and feed it into cross-platform Vector256 logic that works efficiently on both x64 and ARM64 processors.

using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Intrinsics;

public static int SimdSum(ReadOnlySpan<int> data)
{
    int sum = 0;
    int i = 0;

    // Grab a fast, unpinned reference to the underlying data
    ref int current = ref MemoryMarshal.GetReference(data);

    // Process 8 integers at a time (if hardware supports 256-bit vectors)
    if (Vector256.IsHardwareAccelerated && data.Length >= Vector256<int>.Count)
    {
        Vector256<int> vSum = Vector256<int>.Zero;

        for (; i <= data.Length - Vector256<int>.Count; i += Vector256<int>.Count)
        {
            // Load 8 contiguous integers directly into the CPU register
            Vector256<int> vData = Vector256.LoadUnsafe(ref current, (nuint)i);

            // Add them in parallel
            vSum += vData; 
        }

        // Horizontal add to collapse the vector lanes into a final scalar sum
        sum += Vector256.Sum(vSum); 
    }

    // Process any remaining elements normally
    for (; i < data.Length; i++) 
    {
        sum += Unsafe.Add(ref current, i);
    }

    return sum;
}

This is where the magic happens. We've bypassed the GC to keep our memory clean, extracted an unpinned reference, and fed it directly into the CPU's vector lanes.

Level 5: Show the Receipts

It’s easy to talk about zero-allocation theory, but advanced developers deal in metrics. When you strip away the frameworks and embrace the mechanics outlined above, the results in BenchmarkDotNet look like this:

Method	Mean	Allocated
Standard Substring	18.45 ns	32 B
Glacier Span Slice	0.02 ns	0 B

Seeing that 0 B under the allocated column is the entire point.

The Bottom Line

Getting away from the Garbage Collector isn't about rewriting every line of business logic you have. It's about surgical precision. Identify the hot paths—the tight loops where data flows by the gigabyte—and strip away the abstractions.

Drop the heavy frameworks. Stop calling new in a loop. Embrace struct, slice with Span<T>, use the ArrayPool, and build custom column allocators when the scale demands it. Take a look through the Glacier repositories to see these patterns in action. This is how you build engines that don't just participate in the .NET ecosystem, but actually push it to its absolute limits.

I built a native C# Grep engine that's holding it's own with ripgrep (with zero allocations)

Ian Cowley — Tue, 09 Jun 2026 15:26:28 +0000

Let’s be honest: the golden rule of modern software engineering is "never rewrite grep." Tools like ripgrep are written in native Rust, compiled straight to the metal, and aggressively optimized. For 99% of use cases, wrapping a CLI tool or pulling in a massive external dependency is what people do.

But when you are building an ultra-low-latency AI context layer where sub-20ms execution is the hard ceiling, spinning up external CLI processes, handling inter-process communication, and parsing standard output strings back onto the managed heap destroys performance. The Garbage Collector pressure alone kills your agentic execution loop.

So, I decided to see how close I could get with pure, unadulterated .NET 10.

The result? Glacier.Grep.

On a typical developer workload scanning a 257 MB workspace (590 files), it doesn't just rival ripgrep—it actually beats it on case-sensitive paths.

The Raw Benchmarks (Warmed)

Target: 590 files, 257.83 MB text data
OS: Windows (x64)

Engine	Query	Execution Time	Performance Ratio
Ripgrep (Rust)	`"public class"` (Sensitive)	134.9 ms	1.12x
Glacier.Grep (.NET 10)	`"public class"` (Sensitive)	120.4 ms	1.00x (FASTER)
Ripgrep (Rust)	`"public class"` (Insensitive)	210.4 ms	1.52x
Ripgrep (Rust)	`"ThreadIndependentReaderWriterLock"`	142.3 ms	1.23x

Going toe-to-toe with optimized Rust in a managed language requires treating memory like hot lava. Here is exactly how it's engineered under the hood.

1. Zero-Allocation Stack Traversal

The first place search engines lose time is evaluating filesystem metadata and parsing .gitignore rules. If you instantiate DirectoryInfo or materialize file paths as managed strings just to skip a hidden directory, you've already lost.

Glacier.Grep uses System.IO.Enumeration.FileSystemEnumerable<T>. We intercept the OS file handles and evaluate exclusion criteria entirely on the stack using custom ref struct rules before a single path string is allocated.

The .gitignore hierarchy is compiled at startup into a lightweight prefix-tree (Trie). Path matching becomes an instant $O(L)$ operation, pruning folders like bin/, obj/, and node_modules before they ever touch the processing queue.

2. The Hybrid I/O Dispatcher

There is no "one-size-fits-all" for disk I/O. Memory-mapped files are amazing for massive datasets, but forcing the OS to map page tables for thousands of tiny 4KB source files introduces major kernel overhead.

We use a dynamic dispatcher that checks the file length directly from the stack-allocated filesystem entry:

Small files (< 1MB): Handled via RandomAccess.Read straight into a chunk of memory rented from ArrayPool<byte>.Shared. This keeps data purely in the buffer pool and avoids virtual memory mapping overhead.
Large files (> 1MB): Handled via MemoryMappedFile.CreateFromFile. We grab an unsafe byte* pointer directly to the OS page cache, wrap it in a ReadOnlySpan<byte>, and feed it to the execution engine.

3. Hardware Acceleration via .NET 10 `SearchValues<byte>`

We don't convert bytes to characters, and we never split text into an array of strings. The entire search happens on raw UTF-8 bytes.

To find the needle in the haystack, we lean heavily on .NET 10's upgraded SearchValues<byte>. The JIT compiler automatically emits vectorized instructions—utilizing AVX-512 or AVX2 depending on the hardware—to scan 32 or 64 bytes of text in a single CPU clock cycle.

When a match byte sequence is triggered, the engine avoids line-splitting allocations by scanning backwards and forwards to the nearest \n byte boundaries, producing a ReadOnlySpan<byte> slice of the line instantly.

// The .NET 10 hot loop
while (offset < fileData.Length)
{
    // Hardware-accelerated SIMD scan
    int matchIndex = fileData.Slice(offset).IndexOfAny(_searchValues);
    if (matchIndex < 0) break;

    offset += matchIndex;

    // Zero-allocation line slicing via byte boundaries
    int lineStart = fileData.Slice(0, offset).LastIndexOf((byte)'\n') + 1;
    int lineEnd = fileData.Slice(offset).IndexOf((byte)'\n');

    // Process match slice...
    offset += _searchValues.Length;
}

4. Built for Agentic Loops (MCP Native)

This isn’t just a CLI utility. Glacier.Grep is built specifically to serve as a Model Context Protocol (MCP) server for AI coding agents.

When an LLM agent needs to find code patterns, it can invoke the tool directly over standard I/O JSON-RPC. Because the engine runs continuously as a persistent process, there's zero process-spawning penalty. Matches are streamed immediately to the agent's context window via a zero-allocation System.IO.Pipelines stream using Utf8JsonWriter.

Conclusion

Managed languages aren't slow—heavy, framework-obsessed architectures are. When you drop the abstractions, bypass the heap, and write code with mechanical sympathy for the underlying CPU registers, .NET 10 is an absolute speed demon.

Glacier.Grep is open-source and part of the Glacier high-performance storage suite.

👉 Check out the repo here: github.com/ian-cowley/Glacier.Grep

I built a C# Knowledge Layer to solve the AI Agent Memory Crisis

Ian Cowley — Mon, 18 May 2026 17:34:40 +0000

If you are building AI Agents today, you’ve probably noticed a glaring problem: The Memory Wall.

When you give an agent a task (like resolving a complex customer support escalation), it doesn't just need to read a single document. It needs to look at CRM data, trace fraud relationships, read strict legal policies, and review historical tickets.

The industry's current solution is to give the Agent a bunch of separate tools (APIs) and let it figure it out. The result? The Agent burns up 80% of your token budget and 5 seconds of latency just looping through databases, trying to assemble the context itself. Often, it gets confused and hallucinates.

As a software engineer of 40 years, I prefer deterministic logic and bare-metal performance. Agents shouldn't assemble data; our backend architecture should assemble it for them.

Over the past few weeks, I’ve built a suite of zero-dependency, hyper-optimized C# storage engines to handle the four distinct "shapes" of enterprise data. Today, I am releasing the final piece: Glacier.Bundle.

It is a unified semantic orchestration engine that compiles relational, hierarchical, tabular, and vector data into a single, high-density prompt bundle in under 20 milliseconds.

The Four Pillars of AI Memory

Before building the orchestrator, I had to build the engines. If you missed the previous articles, Glacier.Bundle sits on top of these four native C# libraries:

📊 The Tabular Layer: I built a native C# DataFrame engine to rival Python Polars (Glacier.Polaris / PolarsPlus) for structured CRM and metric data.
🧠 The Semantic Layer: I built a zero-dependency C# Vector Database that saturates DDR5 RAM (Glacier.Vector) for fuzzy, historical similarity matching using AVX-512 SIMD.
🌐 The Relational Layer: I built a zero-allocation C# Knowledge Graph (Glacier.Graph) to instantly map fraud rings and entity neighborhoods.
📂 The Hierarchical Layer: I built a Semantic Tree Parser because Naive RAG is dead (Glacier.DocTree) to extract deterministic document policies without chunking them into oblivion.

The Problem: Bridging the Islands

Having four incredibly fast databases is useless if your AI Agent has to make four separate HTTP REST calls to query them.

Glacier.Bundle acts as the single central broker. It runs in-memory alongside your Agent. You register your tabular data, your vector indices, your graph store, and your parsed markdown documents into a thread-safe BundleContext.

Then, instead of asking the AI to "go find the data," you use the BundleBuilder to programmatically compile the absolute truth.

The Code: Compiling Context in C

Here is how you compile a massive, multi-dimensional prompt for a customer escalation:

var queryVector = GetEmbedding("Customer requested API rate expansion.");

// The fluent, zero-allocation context compiler
string contextPrompt = new BundleBuilder(bundleCtx)
    .BeginBundle("Customer Escalation Resolution")

    // 1. Instantly pull structured CRM data (Polaris)
    .AppendTabularRow("Client CRM Profile", sb => sb
        .AppendLine("  ID: User_9021")
        .AppendLine("  Name: DeltaCorp Ltd")
        .AppendLine("  Tier: Enterprise"))

    // 2. Traverse the Knowledge Graph for suspicious IP links (Graph)
    .AppendGraphTopology("Graph_Topology", "User_9021", maxHops: 2, label: "Relational Fraud Mapping")

    // 3. Extract the exact SLA section without chunking errors (DocTree)
    .AppendDocumentTreeSection("DocTree_SLA", "Enterprise SLAs", label: "Guaranteed SLA Policies")

    // 4. Do a SIMD-accelerated math search for past tickets (Vector)
    .AppendVectorContext("Vector_Tickets", queryVector, topK: 1, label: "Semantic Ticket History")

    .Build();

The Output: 17 Milliseconds

When we run this code, the C# runtime queries all four engines—extracting CRM details, tracing 2-hop relational fraud paths, pulling strict SLA Markdown structures, and executing a brute-force vector search.

Here is the benchmark output from the console:

[5] Executing high-speed Bundle compilation...

Bundle compiled in 17.4316 ms!

======================================================================
 SYSTEM RESOLVED CONTEXT BUNDLE: CUSTOMER ESCALATION: DELTACORP LTD
 GENERATED AT: 2026-05-18 17:14:55 UTC
======================================================================

--- [DATA LAYER] CLIENT CRM PROFILE ---
  ID: User_9021
  Name: DeltaCorp Ltd
  Tier: Enterprise
  Value: £85,200.00
  Status: Active

--- [RELATIONAL LAYER] RELATIONAL FRAUD MAPPING (2 HOPS FROM User_9021) ---
Target Entity: User_9021
Connected Network Entities (3):
  SubAccount_B, SubAccount_A, Suspicious_IP_88

--- [HIERARCHICAL LAYER] GUARANTEED SLA POLICIES ---
Document Path: Document Root > Service Level Agreements > Enterprise SLAs
Content Frame:
Enterprise SLAs
Enterprise accounts enjoy a guaranteed 99.99% uptime with immediate 15-minute response times.
No hardware throttling is applied.

--- [SEMANTIC LAYER] SEMANTIC TICKET HISTORY ---
[Rank 1 | Match Score: 1.0000] Historical Ticket #4823: Customer requested custom API access rate expansion. Granted temporary bypass.

The Result: Zero Hallucinations

Look at that output block. If you hand that string to an LLM, it cannot hallucinate.

It doesn't have to guess what tier the customer is on. It doesn't have to guess if the cancellation policy applies to them. It doesn't have to guess if they are connected to a suspicious IP. The deterministic C# application code proved it all before the LLM was even invoked.

And it did it in 17.4 milliseconds—faster than a standard web API takes to negotiate a TLS handshake.

Try the Full Suite

If you are a .NET developer building AI infrastructure, and you are tired of relying on bloated JVM containers and Python scripts, you can now run the entire AI data stack natively in C#.

GitHub: ian-cowley/Glacier.Bundle

NuGet: dotnet add package Glacier.Bundle

This completes the Glacier High-Performance Storage Suite. Let's prove C# is the ultimate backend language for the AI era!

Why Naive RAG is Dead: I built a zero-dependency C# Semantic Tree Parser

Ian Cowley — Mon, 18 May 2026 17:07:05 +0000

If you have built an AI Agent or a RAG (Retrieval-Augmented Generation) pipeline in the last year, you’ve almost certainly run into the exact same problem: Hallucinations caused by Naive Chunking.

The standard industry advice for feeding documents to an AI is to take a 50-page API manual, blindly chop it up into 500-token chunks, vectorize them, and throw them into a database.

This completely destroys the document's structure.

If you have a paragraph that says, "If initiated after 30 days, a 15% fee will be deducted", and it gets chunked away from its ## Enterprise Cancellations header, the AI has no idea who that rule applies to. When a user asks about Free Tier cancellations, the Vector DB might return that enterprise paragraph just because the words matched. Boom. Hallucination.

As a software engineer who hates bloat and relies on deterministic logic, I needed a better way. I didn't want to use massive Python libraries to solve this.

So, I built Glacier.DocTree.

It is a zero-dependency, bare-metal C# library that parses documents into a Semantic Tree instead of flattening them into dumb chunks.

The Fix: Hierarchical Parsing

Instead of chopping text blindly by character count, Glacier.DocTree reads Markdown and builds a strongly-typed parent-child object graph.

# (H1) becomes a root node.
## (H2) becomes a child of the last active H1.
Standard text and code blocks become children of their most recent header.

When you feed it a messy API document, it instantly compiles this beautiful, queryable hierarchy in memory:

==========================================
 Glacier.DocTree | Semantic Parser Engine
==========================================

[1] Parsing Markdown into Semantic Tree...

[2] Visualizing the Document Structure:
└─ [Root] Document Root
  └─ [Header1] Glacier Enterprise API
    └─ [Paragraph] Welcome to the Glacier API. This docu...
    └─ [Header2] Authentication
      └─ [Paragraph] All requests to the API must be crypt...
      └─ [Header3] OAuth 2.0
        └─ [Paragraph] To authenticate via OAuth2, you must ...
        └─ [CodeBlock] ```

json {    "Authorization": "Bearer...
    └─ [Header2] Usage Policies
      └─ [Paragraph] Please adhere to the following usage ...
      └─ [Header3] Rate Limits
        └─ [Paragraph] Free tier users are limited to 100 re...
      └─ [Header3] Acceptable Use
        └─ [Paragraph] Do not use the API to train competing...

[3] Simulating Agent Query: 'Extract Rate Limits context'

--- SEMANTIC CONTEXT ---
LOCATION: Document Root > Glacier Enterprise API > Usage Policies > Rate Limits
--- BEGIN TEXT ---
Rate Limits
Free tier users are limited to 100 requests per minute.
Enterprise users have unlimited access.
If you exceed the limit, you will receive an HTTP 429 status code.
--- END TEXT ---

Look at that LOCATION string.
If an LLM reads that, it cannot hallucinate. It knows exactly what document it's looking at, what section it is in, and who the policy applies to. The structure is the meaning.

Try it out

If you are a .NET developer building AI infrastructure, and you are tired of your Vector DB returning paragraphs completely devoid of context, you need a Semantic Layer.

GitHub: ian-cowley/Glacier.DocTree

It is purely native C#. No heavy frameworks, no Python interop, no API keys required. It just parses documents at blistering speeds and gives your agents the context they actually need.

Let's prove C# belongs in the modern AI ecosystem!

I built a Zero-Allocation C# Knowledge Graph (because JVM graphs are too bloated)

Ian Cowley — Mon, 18 May 2026 11:53:55 +0000

If you are building AI agents, you eventually hit the "Memory Wall".

Your agent doesn't just need semantic text chunks (Vector Search) or structured tables (SQL). It often needs to trace relationships. For example: Find all suppliers connected to this failing part, or Find the common connection between User_A and User_B.

To solve this, the industry tells you to spin up a massive Java-based Graph Database container.

I've been writing C# and SQL for decades, and I despise unnecessary bloat. I didn't want to run a 2GB JVM container locally just to traverse a few hundred thousand relationships.

So, I built Glacier.Graph.

It is a zero-dependency, bare-metal C# Knowledge Graph. It completely bypasses the .NET Garbage Collector and can persist 230,000 edges to disk in 30 milliseconds.

Here is how I architected the engine to achieve memory-bandwidth speeds.

The Problem with "Object-Oriented" Graphs

When most developers try to build a graph in memory, they immediately reach for Object-Oriented Programming:

public class Node 
{
    public string Id { get; set; }
    public List<Edge> Edges { get; set; }
}

public class Edge 
{
    public Node Target { get; set; }
    public string RelationType { get; set; }
}

If you have 100,000 nodes and 500,000 relationships, the .NET Garbage Collector now has to track 600,000 object headers scattered randomly across the heap. Traversing this graph means chasing pointer references through RAM, missing the CPU cache every single time, and dealing with brutal GC pauses.

The Fix: The "Forward Star" Representation

Instead of objects, Glacier.Graph uses the Forward Star (or Compressed Sparse Row) technique.

All edges are stored in flat, primitive integer arrays. When you add a relationship, the engine doesn't allocate an object; it just increments an index and writes to int[] arrays.

// The core of the Graph Store
private int[] _head;      // Starting edge index for a node
private int[] _to;        // Target node ID
private int[] _relation;  // Type of relation (e.g. "KNOWS")
private int[] _next;      // Index of the next edge for this node

When you traverse the graph, you aren't doing heap allocations. You are just doing sequential array index lookups. The CPU cache prefetcher loves this, resulting in near-instantaneous traversal speeds.

The Benchmark: Traversing 150,000 Nodes

I generated a complex synthetic graph with 150,000 nodes and 233,000 edges (including chains, branches, and back-references).

I then ran a Breadth-First Search (BFS) to find the shortest path between User_1 and User_99999.

Here is the raw console output:

==========================================
 Glacier.Graph | High-Performance Graph DB
==========================================

[1] Initializing Graph Engine and generating data...
    Total Nodes: 150,001 | Total Edges: 233,333

[3] Executing BFS Shortest Path (User_1 -> User_99999)...
    Path found in 5.3066 ms!
    Hops: 25
    Route: User_1 -> ... (24 intermediate nodes) ... -> User_99999

[4] Executing 4-Hop Neighborhood Search around User_1...
    Neighborhood scanned in 0.4416 ms!
    Discovered 11 connected entities within 4 hops.

Finding a 25-hop path through 150,000 nodes took 5.3 milliseconds. Mapping an entire 4-hop neighborhood took 0.44 milliseconds.

You can't even complete the HTTP handshake to a standard database in the time it takes this engine to traverse the entire network.

Blazing Fast Persistence

Because the graph is just a collection of primitive int[] arrays, saving the graph to disk is incredibly fast. We don't use JSON or heavy serialization.

Using MemoryMarshal.AsBytes(), the engine grabs the raw bytes of the arrays directly out of RAM and blasts them to the SSD.

[2] Saving raw memory arrays to disk...
    Saved 27.46 MB to 'graph_database.bin' in 30 ms.

[3] Destroying graph in memory and reloading from disk...
    Graph revived from disk in 41 ms!

30 milliseconds to save. 41 milliseconds to revive.

Built for AI Agents

Like my other libraries, I didn't want to build a bloated REST API.

Glacier.Graph includes a built-in Model Context Protocol (MCP) server over standard I/O. You can point your AI agents (via AgentDevKit, Claude Desktop, or Cursor) directly at the .dll.

The AI instantly gets JSON-RPC access to add_node, add_edge, find_shortest_path, and find_neighborhood. It allows LLMs to autonomously traverse a high-speed Knowledge Graph without any Python dependencies.

Try it out

If you are tired of massive containerized databases and want bare-metal C# performance for your relational AI data, give it a shot.

GitHub: ian-cowley/Glacier.Graph

Let me know what your traversal times look like!

I built a zero-dependency C# Vector Database that saturates DDR5 RAM bandwidth

Ian Cowley — Sat, 16 May 2026 13:39:40 +0000

If you’re building AI apps today, you eventually need a Vector Database for RAG (Retrieval-Augmented Generation) or giving your agents long-term memory.

The current ecosystem’s answer to this problem usually involves one of three things:

Pay for a cloud service.
Spin up a massive Rust, Go, or Python Docker container locally (like Qdrant, Chroma, or Milvus).
Use a bloated wrapper library that pulls in 50MB of dependencies just to do math.

I’ve been a software engineer for 40 years. I despise bloat. I don't use heavy data-access frameworks or massive client-side libraries. At its core, a vector database is just a massive 2D array of floats and a tight math loop. I didn't want to spin up a Docker container or rely on Python interop just to do math.

So, I built Glacier.Vector.

It is a purely native, zero-dependency, hardware-accelerated Vector Database for .NET 10. And it is fast enough to literally hit the physical limits of my motherboard.

Here is how I squeezed every drop of performance out of the .NET runtime.

1. The Goal: Zero Allocations

If you have 100,000 documents, and each has a 1536-dimensional embedding (the OpenAI text-embedding-3-small standard), you are looking at about 153.6 million floats (~600 MB of RAM).

If you store this as a float[][] (an array of arrays), the .NET Garbage Collector will create 100,000 object headers on the heap. Your cache locality is ruined, and your GC pause times will be brutal.

Instead, Glacier.Vector uses a zero-copy memory model. It allocates flat arrays in massive chunks, or uses Memory-Mapped files, completely bypassing GC pauses.

When searching, the engine pins the memory and uses fixed pointers and ReadOnlySpan<float>. No bounds checking. No object allocations in the hot path.

2. Pushing the CPU: 4-Way SIMD Unrolling

Searching a vector database requires comparing the user's query against every single document using Cosine Similarity (which, for normalized vectors, is just the Dot Product).

In standard C#, a scalar for loop over 150 million floats takes forever.

To fix this, I wrote a custom compute kernel using .NET hardware intrinsics (System.Runtime.Intrinsics). But just using Vector256 wasn't enough. I unrolled the loop 4-ways to feed the CPU's out-of-order execution pipeline with simultaneous Fused-Multiply-Add (FMA) instructions:

// Unrolled AVX2 fast path (processing 32 floats per cycle)
var acc0 = Vector256<float>.Zero;
var acc1 = Vector256<float>.Zero;
// ...
for (; i <= length - 32; i += 32)
{
    acc0 = Fma.MultiplyAdd(Vector256.Load(pTarget + i), Vector256.Load(pDb + i), acc0);
    acc1 = Fma.MultiplyAdd(Vector256.Load(pTarget + i + 8), Vector256.Load(pDb + i + 8), acc1);
    // ...
}

The Benchmark: Hitting the Memory Wall
I ran a benchmark pumping 100,000 vectors (1536 dimensions each) into the engine. Here is the raw console output:

==========================================
 Glacier.Vector | SIMD Performance Engine
==========================================

[1] Initializing In-Memory Storage...
    Dimensions: 1536
    Target Count: 100,000

[2] Generating and loading synthetic vectors...
    Done! Loaded 100,000 vectors in 547 ms.

[3] Preparing search query...
[4] Executing SIMD brute-force search...

==========================================
 SEARCH COMPLETED IN: 7.152 ms
 Vectors scanned:     100,000
 Operations/sec:      13,982,298
==========================================

Top 5 Results:
  Rank 1 | Score: 0.1056 | ID: 51030 | Meta: Document_Chunk_51030
  Rank 2 | Score: 0.1019 | ID: 87632 | Meta: Document_Chunk_87632
  Rank 3 | Score: 0.1003 | ID: 52591 | Meta: Document_Chunk_52591
  Rank 4 | Score: 0.0994 | ID: 96139 | Meta: Document_Chunk_96139
  Rank 5 | Score: 0.0990 | ID: 29879 | Meta: Document_Chunk_29879

To search the entire database—scanning 153.6 million floats—it took exactly 7.15 milliseconds.

At that speed, the engine is demanding roughly 85 Gigabytes per second of memory bandwidth. The dual-channel DDR5 RAM on my motherboard physically maxes out right around 85-90 GB/s.

I literally cannot make this C# code any faster without buying faster RAM. We hit the physical memory wall.

Built-in AI Integration (MCP) A vector database isn't useful if AI agents can't talk to it.

Instead of building a bloated REST API or requiring gRPC, I built a native Model Context Protocol (MCP) server directly into the engine. It runs entirely over standard I/O (stdio).

You can configure Claude Desktop, Cursor, or your own autonomous agents to point directly at the Glacier.Vector.Host.dll. The AI instantly understands how to call the add_vector and search_vectors tools via JSON-RPC. Zero Python, zero external API keys, zero network latency.

Try it out
If you are a .NET developer building RAG pipelines, AI agents, or just dealing with heavy data, and you hate bloated frameworks as much as I do, try it out.

NuGet:

dotnet add package Glacier.Vector

GitHub: ian-cowley/Glacier.Vector

It pairs perfectly with my other recent project, AgentDevKit (a native C# LLM orchestration library).

Drop a star on the repo, throw a few million vectors at the memory storage, and let me know how many milliseconds it takes to saturate your RAM! Let's prove C# belongs in the AI ecosystem.

C# got left behind in the AI Agent hype. So I fixed it! AgentDevKit

Ian Cowley — Thu, 14 May 2026 17:14:04 +0000

If you’re a .NET developer watching the current AI landscape, you probably know the feeling.

A massive new paradigm drops—in this case, autonomous AI agents—and overnight, Python and TypeScript are flooded with official SDKs, shiny frameworks like LangChain or crewAI, and endless tutorials.

Meanwhile, us C# backend devs are left staring at the wall, waiting for an enterprise-grade port that will inevitably be bloated, overly abstracted, and arrive 18 months late.

I’ve been writing backend software and integrations for 40 years. I don’t like waiting, and I don't like bloated frameworks. I just wanted a native, high-performance way to build AI agents in C# that can actually do things—read files, query databases, and use the new Model Context Protocol (MCP) without a mountain of boilerplate.

Since it didn't exist, I built it.

Meet AgentDevKit (ADK).

What is it?

It’s a native C# Agent Development Kit designed to help you create "Agents"—AI personalities powered by Google Gemini that don't just talk, but think, plan, and act using real-world tools.

Here is a quick look at what I built into it to make it actually useful for backend environments.

1. The Brain & The Hands (Tools)

An LLM is just a brain. Without tools, it's just a chatbot. ADK lets you easily attach "hands" to your agent so it can interact with your systems.

When you ask the agent a question, it pauses, realizes it needs more info, executes your C# tool, and uses the result to finish its answer.

// Setup the connection
var llm = new GeminiService(apiKey);

// Create the Agent
var agent = new LlmAgent(
    name: "SysAdminBot",
    instructions: "You are an expert system administrator."
);

// Give it hands (a tool you define in C#)
agent.Tools.Add(new FileReadTool());

// Run it
string result = await agent.RunAsync("What is in the server.log file?", llm);

2. Model Context Protocol (MCP) Integration

This is the real game-changer. MCP is becoming the industry standard for letting AI securely plug into local environments.

Instead of writing a custom C# wrapper for every single database or file system you want your AI to touch, ADK supports MCP out of the box.

// Auto-connect to a local SQLite database using MCP
var mcp = new McpService();
var databaseTools = await mcp.InitializeFromConfigAsync(config);

agent.Tools.AddRange(databaseTools);

3. Delegation (Building Teams)

One agent is cool. A team of agents is dangerous.
ADK supports orchestration patterns like Pipelines and Parallel workflows. You can literally give a "Manager" agent a "Researcher" agent to use as a tool.

var researcher = new LlmAgent("Researcher", "Find facts...");
var manager = new LlmAgent("Manager", "Guide the project...");

// The Manager can now delegate tasks to the Researcher
manager.Tools.Add(new DelegationTool(researcher, llm));

4. Guardrails (Because I trust no one, especially AI)

If you've managed backend integrations as long as I have, you know that giving an AI blind access to tools is a terrible idea.

ADK includes Human-in-the-Loop (HITL) approvals and interceptors. If a tool is dangerous, wrap it in a SensitiveTool. The agent cannot execute it without an explicit green light from your IApprovalService (which you can hook up to a console prompt, a web UI, or an email).

// Block the AI from breaking out of directories
agent.BeforeToolCall = async (tool, args) => {
    if (args.Contains("..")) {
        throw new SecurityException("Path breakout attempt blocked!");
    }
    return args;
};

5. Resilient Parsing

Smaller local models (and even big ones sometimes) spit out malformed JSON when trying to call tools. ADK has a built-in Self-Correction Loop. If the LLM messes up the JSON structure, the SDK catches the error, feeds the error back to the model, and tells it to fix its formatting automatically based on a retry budget.

Try it out

I built this to fill a massive gap in the .NET ecosystem, keeping the architecture pragmatic and heavily focused on orchestration and safety.

If you are a C# dev wanting to mess around with autonomous agents, MCP, and Gemini without having to learn Python, check it out.

GitHub: ian-cowley/AgentDevKit

Drop a star, try hooking it up to your database via MCP, and let me know how it goes. If you find any bugs, open an issue—I'd love to hear how other C# devs are using it.

I built a minimal, zero-dependency PDF library for C# because I hate bloat

Ian Cowley — Wed, 13 May 2026 19:12:26 +0000

I built a minimal, zero-dependency PDF library for C# because I hate bloat

If you’ve been writing C# for a while, you know the drill. You need to generate a simple PDF—maybe an invoice, a quick report, or a receipt.

So, you open up NuGet, search for "PDF", and suddenly you are staring down the barrel of a 50MB dependency, a labyrinth of complex object models, and licensing terms that require a law degree to understand.

I’ve been a developer for 40 years. I mostly work on backend and integration software for the printing industry. I like keeping things close to the metal, and I generally despise pulling in massive frameworks when a pragmatic, lightweight solution will do the trick.

I didn't want a bloated library. I just wanted to draw some text, a few shapes, and maybe throw an image on a page.

So, I ported a minimal PDF generator to .NET 10. Meet tinypdf-csharp.

What is it?

It is a C# port of the original TypeScript tinypdf by Lulzx. But I didn't just translate it; I added a few quality-of-life features that make it significantly more useful for day-to-day backend development.

The entire library is just over 1,000 lines of code. It has zero external dependencies.

The "Batteries Included" Additions

While the original architecture was beautifully simple, I needed it to do a bit more heavy lifting for real-world tasks. Here is what I added over the original:

Markdown to PDF: This is my favorite addition. You can pass a raw Markdown string into the library, and it handles the layout and spits out a formatted PDF.
Flate (Deflate) Compression: Automatically compresses the PDF streams so your file sizes stay tiny. (You can toggle it off if you need raw streams).
Clickable Links: Full support for clickable hyperlinks, complete with optional underline styling.

Plus, it still has all the basics: text rendering (Helvetica, Times, Courier), shapes (rectangles, circles, wedges, lines), and JPEG image embedding.

Show me the code

It is designed to be ridiculously simple to use. Here is how you generate a PDF with some text and a red box:

using TinyPdf;

var builder = TinyPdfCreate.Create();

builder.Page(ctx => {
    ctx.Text("Hello World", 50, 700, 24);
    ctx.Rect(50, 650, 100, 20, "#FF0000");
});

byte[] pdf = builder.Build();
File.WriteAllBytes("output.pdf", pdf);

Or, if you want to use the Markdown converter:

using TinyPdf;

string md = "# Header\n\nThis is a paragraph.\n\n- List item";
byte[] pdf = TinyPdfCreate.Markdown(md);
File.WriteAllBytes("markdown.pdf", pdf);

Is it fast?

Because there's no massive object tree or bloated memory footprint, it flies. I ran a benchmark generating 1,000 multi-page invoice PDFs concurrently, and it finished in ~270 milliseconds. That’s 0.27ms per PDF. If you are generating receipts or pie charts, it drops to around 0.03ms per PDF.

Try it out

If you are building a microservice, an AWS Lambda, or a simple backend integration and you just need to spit out PDFs without sacrificing your app's payload size, give it a try.

NuGet:

dotnet add package TinyPdf

GitHub: ian-cowley/tinypdf-csharp

Drop by the repo, take a look at the code (again, it's only ~1000 lines, so you can read the whole thing on your lunch break), and let me know what you think. If it saves you from installing a massive legacy PDF framework today, my job is done.

I built a native C# DataFrame engine to rival Python Polars (it's actually faster on some things)

Ian Cowley — Wed, 13 May 2026 10:32:44 +0000

I built a native C# DataFrame engine to rival Python Polars (it's actually faster on some things)

If you work in data science or heavy data engineering, you already know about Polars. It’s the Rust-backed powerhouse that took the Python ecosystem by storm, leaving Pandas in the dust.

But if you’re a .NET developer, the data manipulation story has always been a bit… frustrating. We have Microsoft.Data.Analysis, but it lacks the expressive lazy API and raw speed we crave. We often end up exporting data to Python just to process it, only to bring it back to C#.

I got tired of waiting for a native .NET solution. So, I decided to build one from scratch.

Meet Glacier.Polaris.

It is a high-performance, strongly-typed DataFrame library for C# (.NET 10). It features SIMD-accelerated compute kernels, a lazy execution engine, native nullability (Kleene logic), and it currently passes 135/135 golden-file parity tests against Python Polars.

And after weeks of fighting the .NET JIT compiler and CPU caches, it is actually beating Polars in several key benchmarks.

Here is how I pushed C# to its physical limits to pull this off.

1. Zero-Allocation and SIMD String Filtering

In standard C#, string operations are heavy. If you filter a DataFrame with df.Filter(Expr.Col("Status") == "Completed"), checking materialized .NET string objects one by one will instantly ruin your performance due to pointer-chasing and heap allocations.

To beat Polars (which uses Arrow's contiguous memory format), I couldn't use C# strings.

Instead, Glacier.Polaris stores strings as flat UTF-8 byte arrays. When you execute an equality filter, the engine loads your target string into a Vector256<byte> register. As it scans the 10-million row DataFrame, it fires a single AVX2 instruction (Vector256.Equals) that compares entire words simultaneously against the target bytes.

The Result: String exact-match filtering in Glacier runs in 3.62 ms for 1 million rows, beating Polars (~4.2ms) with zero string allocations.

2. Breaking the 4ms Barrier: The Float64 Sorting War

The hardest fight I had was with ArgSort on Float64 data.

Initially, I wrote a highly optimized, single-threaded Radix sort. I managed to drop the sorting time for 1 million floats to 13.11 ms—a massive 5.4x speedup over the standard .NET Array.Sort.

But Polars was doing it in 4.21 ms.

At 13ms, my C# code had officially maxed out the physical capabilities of a single CPU core. Moving 200MB of data (keys and indices) over 8 radix passes requires about 47 GB/s of memory bandwidth. A single core physically taps out around 15-20 GB/s.

I needed to parallelize it. But .NET's Parallel.For has too much overhead; spinning up the ThreadPool state machine takes 1-2ms alone, which is a death sentence when your target is 4ms.

The Fix: The Parallel Block Tournament Merge
Instead of using standard .NET parallel loops, I built a custom generic parallel block merge engine:

The engine slices the 1M array into isolated chunks and hands them to raw Task objects.
Each core executes a single-threaded Radix sort entirely inside its L2 Cache, meaning it never talks to system RAM, avoiding Translation Lookaside Buffer (TLB) thrashing.
The engine merges the sorted chunks using a stable, parallel pairwise tournament merge.

The Result: Float64 sorting dropped to 12.05 ms for 1M rows, and successfully scaled to sort 10 Million rows in just 84.71 ms.

3. The Benchmarks (C# vs Polars)

I ran these benchmarks on the same machine, comparing Glacier.Polaris (.NET 10 Release build) against Polars 1.40.1.

(Note: Times are in milliseconds. Lower is better).

Operation (1M Rows)	Glacier.Polaris (C#)	Python Polars	Winner
DataFrame Creation	`0.02 ms`	`5.33 ms`	🟢 C# (~266x)
Sum (Int32)	`0.14 ms`	`0.45 ms`	🟢 C# (3.2x)
Standard Deviation	`0.33 ms`	`0.55 ms`	🟢 C# (1.7x)
GroupBy Sum (Int32)	`1.56 ms`	`5.20 ms`	🟢 C# (3.3x)
Inner Join (Small Right)	`2.29 ms`	`4.61 ms`	🟢 C# (2.0x)
Rolling StdDev	`3.15 ms`	`12.92 ms`	🟢 C# (4.1x)

By utilizing single-pass Welford algorithms for variance, contiguous memory, and a custom Fibonacci-hashing hash map for joins, C# absolutely flies.

Try it out

Glacier.Polaris covers ~98% of the Python Polars core surface area, including LazyFrames, query optimization (predicate/projection pushdowns), and full temporal operations.

If you are building high-performance data pipelines, backtesting financial algorithms, or doing ML preprocessing in .NET, I’d love for you to try it out.

GitHub: https://github.com/ian-cowley/Glacier.Polaris

NuGet:

dotnet add package Glacier.Polaris

Star the repo, try to break the lazy execution engine, and let me know what features you want to see next! Let's bring world-class data engineering to .NET.

DEV Community: Ian Cowley

Data-Oriented Design in C#: Why Objects Are Slowing You Down

Data-Oriented Design in C#: Why Objects Are Slowing You Down

The Object-Oriented Trap

Enter Data-Oriented Design: Struct of Arrays (SoA)

The Cache-Friendly Loop

The Secret Weapon: SIMD

The Bottom Line

Replacing Electron with .NET 10: Writing a zero-latency, SIMD-accelerated IDE

1. The Death of the Monolith: Out-of-Process Resiliency

2. The Text Engine: Piece Tables and Zero Allocations

3. SIMD-Accelerated Line Indexing

4. Sub-Millisecond Startup via Native AOT

5. Built for the Modern Workflow

The Takeaway

I built a zero-allocation C# Time-Series Engine to replace Postgres (and it hits 2 Billion values/sec)

The Architecture: Row-to-Columnar Hybridity

1. The Hot Ingest (Zero-Allocation)

2. The Pivot

3. The Compression Engine

The 64-Bit Accumulator Trick

The Raw Numbers

The "No-ORM" Query Engine (Powered by Source Generators)

The Takeaway

Starving the Garbage Collector: A Pragmatic Guide to Zero-Allocation C#

Level 1: The Heap, the Stack, and Cache Locality

Level 2: Slicing Without Allocating & The Async Trap

The Async Trap

Level 3: Custom Allocators and the Rental Market

Level 4: Unleashing Compute (SIMD & MemoryMarshal)

Level 5: Show the Receipts

The Bottom Line

I built a native C# Grep engine that's holding it's own with ripgrep (with zero allocations)

The Raw Benchmarks (Warmed)

1. Zero-Allocation Stack Traversal

2. The Hybrid I/O Dispatcher

3. Hardware Acceleration via .NET 10 SearchValues<byte>

4. Built for Agentic Loops (MCP Native)

Conclusion

I built a C# Knowledge Layer to solve the AI Agent Memory Crisis

The Four Pillars of AI Memory

The Problem: Bridging the Islands

The Code: Compiling Context in C

The Output: 17 Milliseconds

The Result: Zero Hallucinations

Try the Full Suite

Why Naive RAG is Dead: I built a zero-dependency C# Semantic Tree Parser

The Fix: Hierarchical Parsing

Try it out

I built a Zero-Allocation C# Knowledge Graph (because JVM graphs are too bloated)

The Problem with "Object-Oriented" Graphs

The Fix: The "Forward Star" Representation

The Benchmark: Traversing 150,000 Nodes

Blazing Fast Persistence

Built for AI Agents

Try it out

I built a zero-dependency C# Vector Database that saturates DDR5 RAM bandwidth

1. The Goal: Zero Allocations

2. Pushing the CPU: 4-Way SIMD Unrolling

C# got left behind in the AI Agent hype. So I fixed it! AgentDevKit

What is it?

1. The Brain & The Hands (Tools)

2. Model Context Protocol (MCP) Integration

3. Delegation (Building Teams)

4. Guardrails (Because I trust no one, especially AI)

5. Resilient Parsing

Try it out

I built a minimal, zero-dependency PDF library for C# because I hate bloat

I built a minimal, zero-dependency PDF library for C# because I hate bloat

What is it?

The "Batteries Included" Additions

Show me the code

Is it fast?

Try it out

I built a native C# DataFrame engine to rival Python Polars (it's actually faster on some things)

I built a native C# DataFrame engine to rival Python Polars (it's actually faster on some things)

1. Zero-Allocation and SIMD String Filtering

2. Breaking the 4ms Barrier: The Float64 Sorting War

3. The Benchmarks (C# vs Polars)

Try it out

3. Hardware Acceleration via .NET 10 `SearchValues<byte>`