A while ago, I wrote a post about which GitHub Copilot model was the absolute best. The TL;DR of that article? "It depends."
What exactly are you building? What are you trying to achieve? Having clear answers to these questions before you write a single line of code will help you determine the best model for the job.
How do you choose the best model for the job?
When you're building an application that is going to make use of AI tools like Large Language Models (LLMs), think about the application, the outcome, and the experience you want to create. Once you’ve nailed down what you are trying to achieve, it's time to figure out which model actually fits the bill.
Every AI model is trained differently. Because of this unique training, some models are naturally better at specific tasks than others. Think of it like asking a group of developers for their solution to a coding problem. Everyone has been taught slightly differently, everyone has different backgrounds, and everyone brings different experiences to the table. As a result, the answers you get will all have their own unique flavor.
In a similar way, different LLMs excel at different tasks. For example, GPT-5 is a multimodal model, meaning it can handle both text and images natively. Thanks to that capability, you can have GitHub Copilot literally convert a hand-drawn wireframe into a functional website.
There are lots of AI analysis websites that compare various AI models, like the graph shown above which is from Artificial Analysis
Understanding what each model brings to the table is the perfect place to start.
What models are good for what?
Doing a little bit of upfront research to find out which models are tailored for various tasks will save you a ton of headaches later. Here's a quick look at some of the most popular models right now and where they shine:
- GPT-5.2: A multimodal reasoning powerhouse designed for "professional knowledge work." It absolutely nails creating complex spreadsheets, writing clean code, building presentations, and parsing massive walls of context.
- Claude: Purpose-built for deep analysis, beautiful writing, coding, and seamlessly working across long documents.
- DeepSeek: Highly optimised for crushing complex coding challenges and breaking down intense mathematical problems.
It’s worth noting that a lot of the "big" flagship models are all chasing the same crown. But here is the secret: you don't always need the biggest, newest, or shiniest model. Larger models demand significantly more resources; and that means more money, higher token usage, heavy compute, and slower latency. Knowing when not to use a massive model is a superpower in modern development.
As an example, the above shows a map of the OpenAI servers around the world. This was posted one year ago, and thus server location may no longer be accurate.
Compare your resources
Once you know which models can physically handle your task, it’s time to look at the logistics. The main metrics you want to keep an eye on are:
- Cost per token: How much is this going to run you at scale?
- Token usage: How efficiently does the model handle your prompts and responses?
- Average compute time: Will your users be staring at a loading spinner?
Don't forget to check where these providers host their infrastructure, too. For example, I’m based in Australia. Choosing a provider with servers located in the Asia-Pacific region is a massive priority for me because it drastically cuts down on network latency and keeps responses snappy.
So, how do you actually make the choice?
First, define your goal. Second, do your research. But ultimately? You just have to test them out.
See which one actually performs best for your specific workflow. Yes, experimenting might cost you a little bit more upfront in terms of your time, money, and tokens. But spending those resources early on will save you massive amounts of time and budget in the long run.
I ran into this exact scenario recently. I wanted to build an application that took a standard photo and transformed it into a fun, anime-style version of the image. Our team already has access to the OpenAI APIs, I decided to limit my choices to their ecosystem to keep things simple.
Even with that constraint, there were still plenty of options. There are currently four distinct image models available through the GPT image API. (Keep in mind that DALL-E 2 and DALL-E 3 were depreciated in May, so keeping an eye on deprecation schedules is a absolute must if you're building production apps!).
Each of these models comes with a different price tag and performance profile:
It’s easy to fall into the trap of thinking "I'll just default to the highest tier model." But I actually ended up choosing gpt-image-1.5 for my application. Here’s why:
-
gpt-image-1-miniandgpt-image-1didn't quite hit the mark for the specific artistic style I wanted. -
gpt-image-2went too far the other way. It was way too hyper-realistic when I was aiming for a classic, stylized "anime" look. -
gpt-image-2also took far too long to return a response. Because this was a live activation for people to interact with on the spot, I didn't want users sitting around waiting for a single image to load.
Oh, and did I mention gpt-image-2 was significantly more expensive? gpt-image-1.5 gave me the perfect balance of speed, cost, and style.
Wrap up: Your model selection checklist
The next time you're spinning up a new project and staring at a drop-down menu of AI models, don't just click the newest one. Run it through this quick checklist instead:
- Define the mission: What is the exact problem you are trying to solve?
- Scope the capabilities: Which models or ecosystem providers actually specialize in that exact task?
- Count the cost: Look at the token pricing, compute time, and server locations.
- Run the baseline test: Run the exact same prompt or input through your top contenders to see how they handle it.
- Embrace the trade-offs: Pick the model that solves your problem while matching your constraints for speed and budget.
Building with AI isn't about using the most powerful model on the planet. It's about using the right tool for the experience you want to create.
Have you had to make a tough choice between models recently? Which one did you end up going with, and what was the deciding factor? Let me know in the comments below.



Top comments (0)