Robin

Posted on Feb 18

How to Set Up Cursor with Any AI Provider (Not Just OpenAI)

#cursor #ai #coding #devtools

Most Cursor users run the default setup: OpenAI or Anthropic, billed through Cursor's pricing. It works. But if you're paying for API access (Cursor Pro's usage-based billing, or the pay-as-you-go plan), you're probably overpaying.

Here's the thing most people miss: Cursor supports "OpenAI Compatible" providers. That means any API endpoint that speaks the OpenAI chat completions format can be plugged in. OpenRouter, Groq, Together AI, Fireworks, self-hosted models — all fair game.

This guide walks through how to set it up, which providers are worth looking at, and how to actually reduce your monthly AI spend without sacrificing code quality.

Why This Matters: The 100x Price Gap

The pricing spread between AI models in early 2026 is absurd. Here's what you're paying per million input tokens for models that can all handle coding tasks:

Model	Input Cost (per 1M tokens)	Relative Cost
Claude Opus 4.6	$15.00	100x
GPT-4.1	$2.00	13x
Claude Sonnet 4.5	$3.00	20x
Gemini 2.5 Flash	$0.15	1x
DeepSeek V3	$0.14	~1x

That's not a typo. The cheapest models cost less than 1% of the most expensive ones. And for a lot of everyday coding tasks — tab completions, boilerplate generation, simple refactors — the cheaper models perform just fine.

If you're sending 100+ requests per day through Cursor (common for heavy users), the provider you route through determines whether you're spending $3/day or $30/day.

Step-by-Step: Adding a Custom Provider to Cursor

Setting this up takes about two minutes.

1. Get an API key from your chosen provider

Sign up at whichever provider you want to use (I'll list options in the next section). Generate an API key. Copy it somewhere safe.

2. Open Cursor Settings

Open the command palette (Cmd+Shift+P on Mac, Ctrl+Shift+P on Windows/Linux) and search for "Cursor Settings", or click the gear icon in the bottom-left corner.

Navigate to Models in the sidebar.

3. Add an OpenAI Compatible provider

Scroll down to the "OpenAI Compatible" section. You'll see fields for:

API Base URL — The provider's endpoint (e.g., https://openrouter.ai/api/v1 for OpenRouter)
API Key — Your key from step 1
Model Name — The exact model identifier your provider expects (e.g., anthropic/claude-sonnet-4.5, google/gemini-2.5-flash, etc.)

Fill these in and hit save.

4. Select your new model

Back in the editor, click the model selector dropdown in the chat panel. Your custom model should now appear in the list. Select it.

5. Test it

Send a simple prompt. Something like "Write a function that reverses a linked list in Python." If you get a response, you're connected.

If it fails, double-check:

The base URL ends with /v1 (most providers expect this)
Your API key is valid and has credits
The model name matches your provider's naming convention exactly

That's it. You're now routing Cursor through a different provider.

Provider Options Worth Considering

Here are the main OpenAI-compatible providers you can plug into Cursor, each with different tradeoffs.

OpenRouter

The largest model marketplace. 500+ models from every major provider, accessible through a single API endpoint.

Base URL: https://openrouter.ai/api/v1
Pricing: Provider cost + 5% markup (or less with credits)
Strengths: Massive model selection, provider fallback, good documentation
Ideal for: Access to everything in one place

Groq

Optimized for inference speed. If latency matters more than model diversity, Groq is hard to beat.

Base URL: https://api.groq.com/openai/v1
Pricing: Competitive, especially for Llama and Mixtral models
Strengths: Extremely fast responses (100+ tokens/sec)
Ideal for: Tab completions, quick edits where speed matters

Together AI

Focused on open-source models. Good pricing on Llama, Mistral, and other community models.

Base URL: https://api.together.xyz/v1
Pricing: Low, especially for open-weight models
Strengths: Fine-tuning support, serverless + dedicated options
Ideal for: Teams committed to open-source models

Fireworks AI

Fast inference with competitive pricing. Strong on both open-source and proprietary models.

Base URL: https://api.fireworks.ai/inference/v1
Pricing: Aggressive on popular models
Strengths: Speed, function calling support, good for structured output
Ideal for: Production-grade inference at scale

Komilion

Routes requests across 400+ models automatically based on task complexity. You send a request, it picks the right model.

Base URL: https://www.komilion.com/api/v1
Pricing: Pay-per-use with intelligent routing (cheaper models for simple tasks, premium for complex ones)
Strengths: Automatic model selection, OpenAI SDK drop-in
Ideal for: Developers who don't want to manually pick models

Self-Hosted (Ollama, vLLM, etc.)

If you're running models locally, you can point Cursor at your own endpoint. Ollama exposes an OpenAI-compatible API by default.

Base URL: http://localhost:11434/v1 (Ollama default)
Pricing: Free (you're paying for hardware)
Strengths: No API costs, full privacy, works offline
Ideal for: Sensitive codebases, airgapped environments, hobbyists with GPUs

Pro Tips for Reducing Costs

Setting up a custom provider is step one. Here's how to actually get the most out of it.

Match model to task complexity

This is the single highest-impact change you can make. Not every request needs a frontier model.

Tab completions and autocomplete: Use the cheapest, fastest model available. Gemini Flash, DeepSeek V3, or a local Llama model. These are pattern-matching tasks — speed and cost matter more than reasoning depth.
Simple edits and boilerplate: GPT-4.1-mini, Haiku, or similar mid-tier models. Writing a unit test template or adding error handling doesn't need Opus.
Complex refactors and architecture: This is where you bring in Sonnet 4.5, GPT-4.1, or Opus. Multi-file changes, design decisions, debugging subtle race conditions.

If 70% of your requests are simple (they usually are), this alone can cut costs by 50-70%.

Set up .cursorignore

Every file Cursor includes in context costs tokens. Create a .cursorignore file in your project root:

# Heavy directories that rarely help
node_modules/
.next/
dist/
build/
coverage/

# Large generated files
*.lock
*.min.js
*.map

# Data files
*.csv
*.json  # Be selective — some config JSONs are useful
*.sql

This reduces the context window Cursor sends with each request, which directly reduces token cost.

Start fresh conversations

Long conversations accumulate context. Every previous message gets re-sent with each new request. After 10-15 exchanges, you might be sending 50K+ tokens per request just in conversation history.

When you switch tasks, start a new conversation. Your wallet will thank you.

Monitor your usage

Most providers offer usage dashboards. Check yours weekly. Look for:

Requests with unusually high token counts (context bloat)
Patterns in which tasks consume the most tokens
Whether your model choice actually matches your usage pattern

Cost Comparison: A Typical Day of Coding

Here's a rough estimate for a developer making ~100 AI requests per day through Cursor, split across task types:

	Claude Opus 4.6	GPT-4.1	Gemini Flash	DeepSeek V3	Mixed (smart routing)
70 simple requests	$10.50	$1.40	$0.11	$0.10	$0.10
20 medium requests	$6.00	$0.80	$0.06	$0.06	$0.80
10 complex requests	$6.00	$0.80	$0.06	$0.06	$3.00
Daily total	$22.50	$3.00	$0.23	$0.22	$3.90
Monthly (22 days)	$495	$66	$5	$5	$86

Assumes ~2K input tokens per simple request, ~4K for medium, ~8K for complex. Output tokens vary.

The "mixed" column uses cheap models for simple tasks and premium models for complex ones — which is what smart routing does, whether you configure it manually or use a tool that does it automatically.

The takeaway: even switching from a single expensive model to a single cheap model saves hundreds per month. Matching model to task complexity saves even more while keeping quality high where it counts.

Wrapping Up

Cursor's OpenAI Compatible feature is underused. Most developers either don't know it exists or assume it's complicated to set up. It's not — five minutes of configuration can change your cost structure completely.

The AI model market in 2026 is competitive. New models drop weekly, prices keep falling, and the gap between "good enough" and "best available" is narrowing. Taking advantage of that competition is just good engineering practice.

Pick a provider, plug it in, and start paying for what you actually need.

Robin Banner builds tools for developers working with AI APIs. Find him on Dev.to and Twitter.

DEV Community