David Hoze

Posted on Feb 7

How I Built a Version Control Tool in Haskell Using AI – Even Though I'm a Swift Developer

#haskell #cursor #ai

Though I'm an experienced Swift developer, I barely know Haskell. I vaguely understand what a monad is, and I once spent an afternoon fighting cabal before giving up. I had no business writing a serious program in Haskell.

And yet I built bit — a version-control tool for binary files, like Git but for large media, datasets, and anything too big for Git to handle sanely — almost entirely through AI-assisted programming in Haskell. It's a real CLI tool that I use daily. What I discovered runs directly against the conventional wisdom about which languages work best with AI.

The Conventional Wisdom Is Backwards
What Happens When You Make AI Write Haskell?
The Workflow
AI Doesn't Just Write FP — It Discovers FP
The Training Data Paradox Is Temporary
It Works
The Real Question

The Conventional Wisdom Is Backwards

Ask around and you'll hear: use Python, use JavaScript. Most training data, gentlest syntax.

That's true. AI generates Python fluently. The problem is what happens next. It always compiles, because almost everything compiles in Python. Three weeks later you discover a silently mutated dictionary broke your data pipeline. The AI wrote confident, fluent, completely buggy code. Yes, Python is forgiving, but in a forgiving language, AI's mistakes go undetected.

What Happens When You Make AI Write Haskell?

At first, nothing compiles. GHC (the Haskell compiler) rejects almost everything, but the errors are specific — not "something went wrong" but "this function returns IO String and you're using it where Either Error String is expected."

So, Cursor sees the errors right away, and everything is immediately fixed. One or two rounds and it compiled. And when Haskell compiles, it usually works.

I'd gone from "fluent code that's silently broken" to "broken code that converges on correct code almost immediately."

The Workflow

Brainstorm → I describe what I want using Claude Opus 4.5 with extended thinking, and research mode when I thought it was necessary. I then brainstorm about the idea, and when it's final, I ask Claude to give me a Cursor prompt. Claude has access to my code, so it writes a really good prompt for Cursor.

Write code and tests → Cursor then writes the code and tests using Sonnet 4.5 (cheap model) that does it very fast and accurate, cuz it has a killer prompt from Claude.

Compile and fix → Cursor handles all of the dependency and GHC errors. Very smooth.

Test and fix → Also very cool. I have time now to write this article.

Reflect → After a feature is done, I ask Cursor this question:

You introduced some bugs while implementing this feature, right? Analyze why this happened structurally. What about the code's design made it easy to break [X] when touching [Y]? Suggest a refactor that would make this class of bug impossible or caught at compile time.

I expected generic advice. I got precise analysis of real design flaws that had caused real bugs minutes earlier. I took Cursor's suggestions back to Claude and asked it what we should do about this. Claude took some of Sonnet's suggestions seriously and wrote me a refactor prompt for Cursor. And there were some suggestions it declined. My code is now refactored to prevent real bugs AI introduced to the system, because it wasn't built well enough.

The code didn't just get better. It got harder to break.

AI Doesn't Just Write FP — It Discovers FP

Here's what I didn't expect. I asked the AI whether functional programming had abstractions that could improve my code's structure. It searched the web, read FP articles, and came back with concepts I'd never heard of.

There's a Kleisli arrow composition in my codebase now — it's called Pipeline — and it elegantly chains pure transformations: scan → diff → plan, where the pure core has no IO at all and is fully property-testable. I don't understand the category theory. But I don't need to.

Here's what it actually looks like in my code. The entire sync logic is a pure function — no network calls, no filesystem access, just data in and data out:

-- The pure core: no IO, fully property-testable
diffAndPlan :: [FileEntry] -> [FileEntry] -> [RcloneAction]
diffAndPlan sourceFiles targetFiles =
  let sourceIndex = buildIndexFromFileEntries sourceFiles
      targetIndex = buildIndexFromFileEntries targetFiles
      diffs       = computeDiff sourceIndex targetIndex
  in  map planAction diffs

It goes further. Here's what AI built into my codebase — concepts I have some intuition for but do not fully understand:

Phantom Types

A phantom type is a type parameter that appears in a type's definition but isn't used in its data. It exists purely for the compiler to enforce constraints.

My hash type uses one so the compiler distinguishes MD5 from SHA256. Mixing hash algorithms is a compile error. One line of type machinery that eliminates an entire class of bugs forever:

data HashAlgo = MD5 | SHA256

newtype Hash (a :: HashAlgo) = Hash Text

Now if a function expects Hash 'MD5 and you pass it Hash 'SHA256, GHC stops you at compile time. No runtime check needed. I think it looks cool. Couldn't write it myself.

Opaque Types with Smart Constructors

Remote exports its type but hides the constructor. You can only create one through mkRemote. Invalid remotes are unrepresentable:

module Bit.Remote
  ( Remote          -- type exported, constructor hidden
  , mkRemote        -- the only way to create a Remote
  , remoteName      -- read-only access
  ) where

data Remote = Remote
  { _remoteName :: String
  , _remoteUrl  :: String
  }

mkRemote :: String -> String -> Remote
mkRemote = Remote

This is a standard pattern in Haskell for maintaining invariants through the type system. Code outside this module literally cannot construct an invalid Remote.

ADTs for Every Domain Concept

An ADT (Algebraic Data Type) is a type defined by enumerating its possible variants. The compiler forces you to handle every variant — miss one and GHC gives you a warning (or an error, if you enable -Wall).

Every domain concept in my project is modeled this way:

-- What changed between local and remote?
data GitDiff
  = Renamed LightFileEntry LightFileEntry
  | Added   LightFileEntry
  | Deleted LightFileEntry
  | Modified LightFileEntry

-- What should rclone do about it?
data RcloneAction
  = Move Path Path
  | Copy Path Path
  | Delete Path
  | Swap Path Path Path

-- The planner: pure function, no IO
planAction :: GitDiff -> RcloneAction
planAction (Modified f)      = Copy f.filePath f.filePath
planAction (Renamed old new) = Move old.filePath new.filePath
planAction (Added f)         = Copy f.filePath f.filePath
planAction (Deleted f)       = Delete f.filePath

If I add a new variant to GitDiff tomorrow, the compiler immediately tells me every function that needs updating. In Python, that's a bug waiting to happen at runtime.

A Free Monad Effect System (That Got Removed)

AI built a free monad effect system — with a pure interpreter that simulates the entire program without touching IO, using a fake filesystem in memory. I used it, and then the AI itself analyzed the tradeoff and recommended removing it: the complexity wasn't justified since I had no pure tests yet.

It's documented in my spec under "What We Chose": ReaderT BitEnv IO (no free monad) — rejected as premature.

AI didn't just apply FP — it applied it, evaluated the cost, and rolled it back when simpler was better.

A ConcurrentIO Newtype Without MonadIO

AI built a ConcurrentIO newtype that deliberately hides its constructor and omits MonadIO, so nobody can smuggle unsafe lazy IO into concurrent code. The comment in the source says:

newtype ConcurrentIO a = UnsafeConcurrentIO { runConcurrentIO :: IO a }
  deriving (Functor, Applicative, Monad)
  -- NOTE: No MonadIO instance! This is intentional.
  -- Deriving MonadIO would allow 'liftIO' to smuggle arbitrary lazy IO.

I know enough to appreciate this. I don't know enough to have designed it.

What We Deliberately Don't Do

And then there's a "What We Deliberately Do NOT Do" section in my spec, where AI listed FP abstractions it considered and rejected: typed state machines, representable functors, group structures. It reasoned about the right level of abstraction for each problem.

I want to be clear: I am not familiar with most of these concepts. I have intuition — they feel right, they look elegant. But the AI found them, read the articles and the docs, and applied them to my code! I don't have to be afraid, because the compiler shouts at the errors, the tests find the bugs, and everything just magically works! No, really.

The Training Data Paradox Is Temporary

Supposedly AI is worse at Haskell today because there's less training data. From my experience, it matters less than you'd think. My AI needed two abilities: generate a reasonable first attempt, and respond intelligently to compiler errors. The first requires some training data. The second requires reasoning — and that is improving fast.

My codebase is proof. The AI didn't retrieve memorized Haskell patterns for phantom types or free monads. It reasoned about type relationships, searched for solutions, and applied concepts from articles it had never seen during training. As models get better at reasoning, the importance of training data volume shrinks. And the languages with the strictest compilers will have the biggest advantage, because they provide the richest feedback signal.

The conventional wisdom says: use the language AI knows best. I think the better advice is: use the language whose compiler teaches AI the most.

It Works

I should mention: bit isn't a toy project. I use it daily. I'm its only user so far (just pushed it to GitHub a week ago), but I'm also its active developer — building features and using them as I go, and... it works. It's a pleasure to write this way. I just tell Claude what I want, and we think together about how to do it. The test suite is comprehensive because AI wrote tests for every feature, so I know everything's good. The architecture is clean because AI audited its own mistakes and proposed structural fixes. And the code uses FP concepts I barely understand.

Mind this: I built this in a language I barely know, using concepts I can't fully explain, with an AI that learned those concepts on the fly. And the result is more robust than most codebases I've seen written by teams of experienced developers.

The Real Question

For decades we chose languages based on how easy they were for humans to write. Python won that race. But the question is shifting. It's no longer "what's easy to write" but "what's easy to write correctly, when AI is doing most of the writing?"

The next time you start a project with AI, consider reaching for the language that makes AI accountable, not just productive. You might be surprised how far you get — even in a language you barely know.

I was.

DEV Community