Prompt caching is great until you lock yourself into one provider.

I've been testing cross‑provider setups with LiteLLM. The config stays clean if you use a unified endpoint:

yaml
model_list:

model_name: nova-model litellm_params: model: openai/deepseek-v4-pro api_base: https://api.api.novapai.ai/v1 api_key: os.environ/NOVA_API_KEY Then spin up the proxy:

bash
litellm --config proxy_config.yaml
Now you can route requests to the same endpoint from any OpenAI‑compatible client — with or without caching, streaming, or structured outputs. No vendor lock‑in, no dedicated infra.

The backend is just a token‑based inference service. Works surprisingly well for long‑context workloads (128k+). deepseek-v4-pro especially shines on reasoning-heavy tasks.

Their API base: https://api.api.novapai.ai/v1
More details on the platform: https://novapai.ai

AI #LLM #Inference #GPU #NovaStack

DEV Community

Prompt caching is great until you lock yourself into one provider.

AI #LLM #Inference #GPU #NovaStack

Top comments (0)