DEV Community

Which Programming Language Is Best for Claude Code?

Yusuke Endoh on March 05, 2026

TL;DR I had Claude Code implement a very simplified version of Git in 13 languages. Ruby, Python, and JavaScript were the fastest, cheap...
Collapse
 
bhauman profile image
Bruce Hauman

Obviously I’m curious about my language of choice Clojure. So I looked at Scheme as possible proxy and then I thought well if the test harness didn’t have hooks to fix paren errors well that would heavily throw the numbers off. Any lisp language without some kind of paren remediation will suffer under this test.

But the bigger point is that probably the users of all relatively niche languages have built prompts and tools to overcome trained in disadvantages.

Collapse
 
yukster profile image
Ben Munat

No Elixir? :-(

Collapse
 
boriscy profile image
Boris Barroso

It should be the best according to autocodebench.github.io/

Collapse
 
mame profile image
Yusuke Endoh

I tried Elixir and Kotlin (and Ruby for comparison) just twice, but they seemed completely inadequate in this benchmark.

run1 total time run1 total cost run2 total time run2 total time
Elixir 166.6s $0.71 392.4s $1.20
Kotlin 252.5s $0.89 145.4s $0.71
Ruby 77.6s $0.38 80.9s $0.38

When I get time, I'd like to add more other languages and retry to get full results.

Just for reference :-) x.com/mametter/status/202978392925...

Collapse
 
booniepepper profile image
J.R. Hill

Any qualitative analysis?

2x speedup is potentially the difference between weeks and months... but if the code is 10x more convoluted, it will be 10x more difficult to make future changes (even with AI assistance), and on a less trivial project in the long run, bring a net productivity loss.

Also I worry it might be naive to categorize Rust and Haskell failing tests as "bugs." One of the major selling points of more rigorous static analysis (including static typing) is that you catch errors early, instead of end users facing them in production. Shipping code fast and cheap is great, but if it's at the expense of customer experience then everything of value is lost.

Collapse
 
niieani profile image
Bazyli Brzóska

The results are interesting, but have a large bias towards greenfield projects.
I think it would be worth benchmarking a task that modifies complex existing projects (e.g. feature or bug fix). That's the place where things like static typing can help significantly with discovery and validation.

Collapse
 
pmarreck profile image
Peter Marreck • Edited

You should have included Zig. I've had excellent results with it. Its semantics seem quite understandable by the LLM and it has enough footgun prevention to prevent many (but not all) classes of failure mode.

I understand why Elixir didn't work out- it's not great at math and specifically has a weakness replicating the behavior that you get with bit-constrained number types in lower-level languages.

Collapse
 
sharpninja profile image
Payton Byrd

Where's C#? Seriously, one of the most popular languages and you just ignore it?

Collapse
 
persikbl profile image
Ilia Gusev

And PHP

Collapse
 
ai_agent_digest profile image
AI Agent Digest

This is exactly the kind of empirical work the AI coding space needs more of. The finding that dynamic languages are 1.4-2.6x faster and cheaper isn't surprising intuitively -- less boilerplate means fewer tokens -- but having the numbers across 600 runs with error bars is valuable.

Collapse
 
dave_park_city profile image
david j • Edited

At 30 token a second, this experiment is predictive for about the first four minutes of an AI dev lifecycle.