DEV Community

Cover image for I built a Python AI agent and Pylance drove me to build a type checker and LSP
Christian Findlay
Christian Findlay

Posted on • Edited on

I built a Python AI agent and Pylance drove me to build a type checker and LSP

TLDR;

Basilisk has 98.5% PEP conformance. Got your attention? Read on...

A while back I started building Nimble Agent — a LangChain-based coding agent in Python. The idea was to fix the things that annoy me about other AI agents: smarter prompting, cheaper models, and acceptance criteria so the agent can't just hand-wave "done" and stop.

So I tried the standard Pylance extension and Pyright as a type checker. I was seriously underwhelmed.

Skip the hype. Install the extensions here:

More coming...

The Pylance wall

Python has a perfectly good type system, and almost nothing enforces it.

Pylance defaults to gradual typing. Untyped code sails straight through. A function with no annotations? Fine. An implicit Any swallowing an entire call chain? Fine. You opt into strictness, rule by rule, setting by setting — and even then half of it slides. My annotations weren't contracts. They were comments that happened to compile. For a tool whose whole job is generating Python, "your types are vibes" is a non-starter.

Then there's the part that actually made me angry: Pylance is proprietary and welded to Microsoft's official VS Code build. I don't live in vanilla VS Code. I might bounce between Cursor, Windsurf, sometimes Zed. The moment you step outside the Microsoft walled garden, the good experience evaporates — different engine, missing features, or a license that says you're not allowed to be there at all.

Moreover, I just couldn't figure out how to configure the type checker in vscode to analyze for all files. It would only show errors and warnings for the file I was editing even though I tried everything to configure it.

So I built Basilisk

Basilisk is an open-source Python language server, type checker, debugger, and profiler — written in Rust, shipped as a single binary, no Node runtime, no Python runtime, no Microsoft dependency.

The core stance is the whole pitch: strict by default, no permissive mode.

def greet(name):
    return "Hello " + name
Enter fullscreen mode Exit fullscreen mode
error[BSK-E0001]: Missing parameter type annotation for `name`
error[BSK-E0002]: Missing return type annotation
Enter fullscreen mode Exit fullscreen mode

There's no --basic, no --permissive, no global "relax everything" knob. Rust doesn't ship a flag to disable the borrow checker; TypeScript's strict: true is just expected. Basilisk takes that stance for Python. Escape hatches exist — but the burden is on you to justify the exception, not to remember to turn the rule on. 151 diagnostics, all on, all the time.

The IDE extension gives you all the tools to ignore what you need to to ignore to start using Basilisk with your untyped Python, but it also gives you lots of tools to add types gradually.

But "strict checker" isn't the actual goal

The checker is the wedge. The real aim is the thing Pylance almost is and won't let you have everywhere: one extension that's your complete Python dev experience.

Out of one Rust binary, Basilisk gives you:

  • 🧠 Type checking + inference (strict by default, targeting 100% PEP conformance)
  • 🐛 Debugging (embedded debugpy, zero-config)
  • 🔥 Profiling (py-spy, inline in the editor)
  • 🧪 A test explorer (pytest/unittest with coverage overlay)
  • 🔧 Real refactors — extract, inline, move-to-file, scope-aware rename, change signature
  • 💡 Autofixes and formatting (delegated to Ruff — we don't reinvent the wheel)

And — the part that started this whole thing — the same experience everywhere: VS Code, Cursor, and Windsurf (via Open VSX), plus Zed and Neovim. Because the LSP drives the functionality, not the IDE. The editor just reacts to it. Install once, get the full thing, wherever you write Python.

Where it's at

Basilisk is open source (MIT), built on the same parser that powers Ruff, with Salsa for sub-10ms incremental checks. I started it because I wanted my AI agent's Python to be as honest as TypeScript, in whatever editor I happened to have open. It turned into something a lot bigger.

If you've ever wanted Python to just say it out loud when your code isn't typed — give it a try.

Check out Nimblesite's other tools

Top comments (5)

Collapse
 
motedb profile image
mote

Default-strict with 151 rules all enabled is refreshing. Pylance/Pyright's gradual typing means the type checker is only as useful as your discipline to annotate everything. For AI-generated code — where the agent decides whether to annotate — that's a real problem. The agent will skip annotations, Pyright shrugs, and you get runtime type errors in production.

Salsa incremental computation is a good call for LSP responsiveness. Ruff uses the same approach for its linter — sub-10ms incremental checks make real-time diagnostics possible without burning CPU.

One question: single Rust binary with no Python/Node dependency is nice, but how do you handle Python runtime features like dynamic imports or meta_path hooks? Those can't be type-checked without actually running the Python import system — does Basilisk fall back to Any for those or does it maintain a Python runtime model?

Collapse
 
cfdevelop profile image
Christian Findlay

Thanks, yes. Caching is only partially implemented right now but we're working on the full salsa implementation. I think this will make a huge difference over the long run. But the benchmarks show that Basilisk is already competitive with the other checkers

Collapse
 
alexshev profile image
Alex Shev

This is a good reminder that agent work quickly turns into tooling work. Once the agent starts touching real code, types, diagnostics, and editor feedback become part of the product.

Collapse
 
cfdevelop profile image
Christian Findlay

You're spot on. I strongly believe the key to getting agents to not screw up your codebase is to have thick layers of quality gates: mutation testing, type checking, static code analysis and most critically code duplication thresholds. We've been working like crazy on all these. Here are some of the tools:

Deslop - code duplication for Python, Rust, C# and Dart
SharpLsp - F#/C# Tooling

So far deslop has been invaluable

Collapse
 
alexshev profile image
Alex Shev

That is exactly the layer I would want around agentic coding too. The duplication threshold point is especially practical because it catches a failure mode agents create quietly: they do not always break the app, they just add a second half-correct version of the same idea.

Quality gates are not there to make agents slower. They are there to make the agent's freedom survivable.