I wanted to use Google Gemini inside JetBrains AI Assistant through the "OpenAI-compatible" BYOK provider — pointing straight at Gemini's OpenAI-compatibility endpoint instead of burning JetBrains' limited AI quota.
It didn't work. The IDE kept throwing "Something went wrong / Try again", especially when generating Git commit messages. Model selection broke too.
The non-obvious root cause: Gemini 2.5 models keep "thinking" enabled by default, which adds ~10–15s before the first token — long enough to trip JetBrains' OWN internal request timeout. A proxy can't extend the IDE's timeout, so the real fix is making Gemini answer fast.
So I built a tiny fully-async proxy (FastAPI + httpx, Dockerized) that sits between the IDE and Gemini and:
- strips the
OpenAIAPI/models/prefix IntelliJ prepends to model ids - returns a clean static
/modelslist (the IDE's parser is strict) - streams SSE chunk-by-chunk with zero buffering and no read timeout
- injects
reasoning_effort: "none"to disable Gemini 2.5 thinking → time-to-first-byte drops from ~14s to ~1–2s, so the timeout never fires
Code generation, chat, and commit messages all work reliably now. It's a small, focused utility — not a framework. MIT licensed, feedback and PRs welcome.
🔗 https://github.com/agitrubard/jetbrains-ai-gemini-local-proxy
Top comments (0)