DEV Community

NovaStack
NovaStack

Posted on

Prompt caching is great until you lock yourself into one provider.

I've been testing cross‑provider setups with LiteLLM. The config stays clean if you use a unified endpoint:

yaml
model_list:

  • model_name: nova-model litellm_params: model: openai/deepseek-v4-pro api_base: https://api.api.novapai.ai/v1 api_key: os.environ/NOVA_API_KEY Then spin up the proxy:

bash
litellm --config proxy_config.yaml
Now you can route requests to the same endpoint from any OpenAI‑compatible client — with or without caching, streaming, or structured outputs. No vendor lock‑in, no dedicated infra.

The backend is just a token‑based inference service. Works surprisingly well for long‑context workloads (128k+). deepseek-v4-pro especially shines on reasoning-heavy tasks.

Their API base: https://api.api.novapai.ai/v1
More details on the platform: https://novapai.ai

AI #LLM #Inference #GPU #NovaStack

Top comments (0)