Shrijal Acharya for Composio

Posted on Jul 23 • Originally published at composio.dev

🍕 Is Kimi K2 Actually Better Than Claude Sonnet 4 for Coding? ✨🤔

#ai #react #webdev #programming

Moonshot AI (a Chinese Artificial Intelligence company) recently announced their new AI model, Kimi K2, which is an open-source model purposely built for agentic tasks, and some even consider this model an alternative and open-source version of the Claude Sonnet 4 model.

While Claude Sonnet 4 comes with $3/M input token and $15/M output token, Kimi K2 is a fraction of that and comes at $0.15/M input token and $2.50/M output token. Crazy, right?

According to the Moonshot AI team, Kimi K2 has outperformed almost all coding models in many benchmarks, but of course, the main test comes with using and testing in a real-world scenario.

So, in this article, we'll see what Kimi K2 can do and how well it compares with Sonnet 4 in coding.

TL;DR

If you want to dive straight into the results and see how both models perform in coding, especially when testing them head-to-head, here's how the results turned out:

If we compare frontend coding, both are solid, but I slightly prefer Kimi K2’s responses.
Neither model got the MCP integration working out of the box, but Kimi K2’s implementation was closer to being usable.
For newer libraries and agentic workflows, both struggle. But Kimi K2 looks more promising.
We ran two code-heavy prompts with Claude Sonnet 4, resulting in approximately 300K token consumption and a cost of about $5.
For comparison, Kimi K2 used a similar number of tokens and cost only about $0.53, which is nearly 10 times cheaper.

Kimi K2 has a very low output of tokens per second, around 34.1, significantly slower than Claude Sonnet 4, which is around 91.3. However, you do get slightly better coding responses, with not much difference in code quality either.

Kimi K2 is way cheaper, and when it works, it really works. So if you’re on a budget, the choice is kind of obvious. 🤷‍♂️

If I had to pick between these two, I'd definitely go with Kimi K2 for coding. But at times, it does feel like waiting an eternity for it to finish.

Now, decide for yourself which one is the better fit for you based on your workflow.

What's Covered?

All in all, in this blog post, we'll compare the open-source Kimi K2 and Claude Sonnet 4 in frontend and agentic coding to see which one comes out on top. We'll mainly look at:

Pricing and Speed
How good it is with frontend (we'll test with a chat app)
How well it handles recently launched libraries

If that sounds interesting to you, stick around, and you might just find out which one is the better fit for you.

How's the test done?

I’ll be using Claude Code as the coding agent for both models. Yes, both.

For Claude Sonnet 4, that’s straightforward.

But Kimi K2 is not available on Claude Code by default. But with a quick little hack, we can get it working through the Kimi K2 API directly.

So in the end, we’re using the Claude Code interface with Kimi K2 as the engine under the hood.

If you’re curious about how to set that up, check out this guide: Run Kimi K2 Inside Claude Code.

Coding Comparison

As said earlier, we'll do frontend and agentic coding tests for both models.

We'll test it with a Next.js application and ask it to build a complete chat application with voice and optionally image support, and also make it work as an MCP client that can connect to any MCP servers.

Initial Setup

Let's make things a bit easier for these models by setting up a Next.js application and adding all the environment variables before they start coding the implementation.

Initialize a new Next.js application with the following command:

npx create-next-app@latest agentic-chat-composio \
--typescript --tailwind --eslint --app --use-npm && \
cd agentic-chat-composio

Next, we'll need Composio's API key because we'll use it to access managed production-grade MCP servers in our chat application.

Go ahead and create an account on Composio, get your API key, and paste it into the .env file in the root of the project.

COMPOSIO_API_KEY=<your_composio_api_key>
OPENAI_API_KEY=<your_openai_api_key>

And that's all the setup we need to do manually. Now we'll leave everything else to these two models.

Frontend Coding Comparison

💁 Prompt: Build a real-time chat application in this Next.js application using WebSockets. Users should be able to connect and chat directly, without friend requests. Set up a basic backend (which can be API routes or a simple server) to manage WebSocket connections. Also, add voice support. Focus on a clean message flow between the users. Make the UI beautiful and modern using Tailwind CSS and ShadCN components. Keep the code modular and clean. Ensure everything is working.

Kimi K2

This took almost forever to implement. I used the Targon provider on OpenRouter, and it took over 5 minutes to generate this code. Initially, I tried adding WebSocket support directly in Next.js, but after realizing it lacks solid support for it, I reverted and ended up implementing a separate Node.js server for WebSocket instead.

Here's the output of the program:

But after all that, it did implement the chat app with voice and WebSocket support (spoiler alert: Claude Sonnet 4 failed to implement the voice support completely).

Claude Sonnet 4

This portion of the implementation took less than 2-3 minutes and voila, it was somewhat similar to the Kimi K2 implementation, and everything works.

Oh, and I forgot to ask it to add voice support, so I asked it to implement voice and optional image support for the chat application.

The first thing that disappointed me was that even though I asked for both voice and image support, it ignored adding image support. I requested image support only if it was 100% sure it would work, but it still should have explained why it didn't include the image support feature.

I guess it had some issue with following the prompt.

Also, I see that it added the functionality with the Web Speech API, which is not supported on some browsers like Firefox, but it shows as not supported on all the browsers, including Chrome, which supports it completely. So, there's some problem with its implementation of the API.

If I compare just the frontend coding, it's good. In fact, it's really good; it understood everything and planned the implementation from a single prompt. However, for some specific browser APIs, I'm not sure if it's as effective, as we've already seen above.

Agentic Coding Comparison

The main purpose of this comparison is to see how well these two AI models can work with recent libraries like Composio's MCP server support.

💁 Prompt: Extend the existing Next.js chat application to support agentic workflows via MCP (Model Context Protocol). Integrate Composio for tool calling and third-party service access. The app should be able to connect to any MCP server and handle tool calls based on user messages. For example, if a user says "send an email to XYZ," it should use the Gmail integration via Composio and pass the correct arguments automatically. Ensure the system handles tool responses gracefully in the chat UI.

Kimi K2

You can find the entire source code here: Kimi K2 Chat App with MCP Support Gist

To my surprise, it didn't work at all. I mean, looking at the implementation code, it's quite close, but it does not work. If we do a few follow-up prompts or even manually modify the code ourselves, it's not tough to fix it.

Here's the output of the program:

Claude Sonnet 4

You can find the entire source code here: Claude Sonnet 4 Chat App with MCP Support Gist

You can find the complete chat with Claude Code here: Claude Sonnet 4 Claude Code Raw Response Gist

Here's the output of the program:

On this one, it literally took about 10 minutes, and during that time, it kept breaking and fixing things back and forth. The most time-consuming part was literally fixing TypeScript errors 🤦‍♂️, as you can see in the Gist link I’ve shared above.

Finally, after all that, I got the code, and it still does not work. Worse, it gives a false positive, saying tool calls were successful, but they're not. Also, it did not use the correct Composio's TypeScript SDK, which is @composio/core.

All in all, this is not really expected from this model. At least if we got a bare minimum working product, that would be a lot better than this.

Summary

In both frontend and agentic coding, Kimi K2 held its ground, slightly outperforming Claude Sonnet 4 in terms of implementation quality and code response accuracy. I used the Targon provider on OpenRouter to run Kimi K2. While neither model nailed the MCP integration perfectly, Kimi K2's output was a bit cleaner and closer to working code.

It's worth mentioning that all the back and forth from Claude Sonnet 4 racked up around 300K tokens, costing about $5, while Kimi K2’s equivalent output only cost about $0.53. Same token ballpark, vastly cheaper output.

Conclusion

Both models are pretty rock solid, but not always. When it comes to finding and putting things together in working code with recent libraries and techniques, they still fall short.

However, if I have to compare the code response from both of these models, Kimi K2 seems to be very slightly better, but this might differ based on the questions. I mean there's not much difference between these two.

It's just that the pricing of Kimi K2 is too cheap and performs comparatively the same or even better at times than Claude Sonnet 4, so I'd recommend you pick Kimi K2 for your coding workflows.

Let me know what you think of these two models in the comments below! ✌️

Shrijal Acharya

Full Stack SDE • Open-Source Contributor • Collaborator @Oppia • Mail for collaboration

Top comments (18)

Bhaskar Prajapati • Jul 23

Love this comparison. I wonder if this is an alternative to general gpt models?

Shrijal Acharya Composio • Jul 23

I can't say it yet with K2, but Sonnet 4 really is an alternative to general GPT models if you're referring to a model that's fairly good with most general tasks.

Shayne Villarin • Jul 23

No, none of it is the alternative to gpt models.

Shrijal Acharya Composio • Jul 23

Why do you feel so?

Shayne Villarin • Jul 31

Just my preference

Bhaskar Prajapati • Aug 5

Okay, that's fine

Shrijal Acharya Composio • Jul 23

Let me know your thoughts on Kimi K2 and how the experience has been with it so far.

Aayush Pokharel • Jul 23

👏🏻✌️

Shrijal Acharya Composio • Jul 24

Thanks

Aavash Parajuli • Jul 23

Congratulations on the agentic funding! I love the work of this team. 👏🏼

Shrijal Acharya Composio • Jul 24

Thank you. Lot more to come in the future from Composio. Stay tuned!

Custom Patches By Fineyst • Jul 25

dribbble.com/shots/26318383-Top-10...
follow us on dribble.com

Jim Parson • Jul 24

Really curious to see how Kimi K2 performs on multi-agent workflows over time

Shrijal Acharya Composio • Jul 24

Let's see how good it turns out to be in the long run

Custom Patches By Fineyst • Jul 28

nice

Shrijal Acharya Composio • Jul 28

Thanks, Robert! ✌️

Best Codes • Jul 24 • Edited

Great article! There's something I would like to point out about this:

Kimi K2 has a very low output of tokens per second, around 34.1, significantly slower than Claude Sonnet 4, which is around 91.3.

Kimi K2 is an open-source model, which means that there are multiple providers for it, not just one like Claude. You can get the same Kimi K2 model at over 200 tokens per second (for free!) on Groq and at increased speeds on other AI model providers as well.