DEV Community

Cover image for 🚀 I Built an AI Code Conversion Benchmark Platform
Aryan Kumar
Aryan Kumar

Posted on

🚀 I Built an AI Code Conversion Benchmark Platform

Over the last few weeks I’ve been working on a project called CodexConvert.

It started as a simple idea:

What if we could convert entire codebases using multiple AI models — and automatically benchmark which one performs best?

So I built a tool that does exactly that.

🔁 Multi-Model Code Conversion

CodexConvert lets you run the same conversion task across multiple AI models at once.

For example:

Python → Rust
JavaScript → Go
Java → TypeScript

You can compare outputs side-by-side and immediately see how different models perform.

📊 Automatic Benchmarking

Each model output is evaluated automatically using three metrics:

✔ Syntax Validity
✔ Structural Fidelity
✔ Token Efficiency

Scores are normalized to a 0–10 scale, making it easy to compare models.

🏆 Built-in Leaderboard

CodexConvert keeps a local benchmark dataset and generates rankings like:

Rank Model Avg Score
🥇 GPT-4o 9.1
🥈 DeepSeek 8.8
🥉 Mistral 8.4

You can also see which models perform best for specific language migrations.

🧠 Modern Workspace UI

The interface works like a developer dashboard:
Inputs | Model Outputs | Benchmark Insights
You can upload an entire codebase, run conversions, and analyze results in one place.

🔒 Privacy-First Architecture

One important design decision:
CodexConvert has no backend server.

Everything happens in your browser:
• API keys stay in session storage
• code is sent directly to the AI provider
• nothing is stored remotely

🧩 Tech Stack
React + TypeScript
Vite
Tailwind CSS
JSZip
OpenAI-compatible API providers

💡 Why I Built This
Developers constantly ask questions like:

Which AI model is best for Python → Rust?
Which model produces cleaner TypeScript?
Which one is most token-efficient?

CodexConvert helps answer those questions.

🔗 GitHub

If you’d like to try it out or contribute:

👉 https://github.com/aryanjsx/Openclaude

Feedback is very welcome.

I’m especially interested in ideas for:

• better benchmarking metrics
• additional model providers
• new leaderboard visualizations

Thanks for reading 🙌

Top comments (0)