Arunish

Posted on Mar 24 • Edited on May 26

Introducing Apiction: Talk To Any AI Model From Any Provider

#ai #api #tooling

If you want to compare different AI models or different AI model providers, test the same prompt across five providers, switch between image generation and text in the same session, or give a model access to real-time web search, you'd be shuffling across three or four different apps to do it.

I created Apiction to address that. It is a browser-based, model and provider-agnostic AI workspace.

You bring your own API keys, and configure different settings on different tabs. It handles text conversations, tools, images, summaries, debugging for developers, and more. Everything is stored in your browser and you can run it locally as well.

This article walks through what you can do with it, both as a general user and as a developer.

For Everyone

1. General Text Conversations

The most obvious use is chatting with AI models. You type a message, get a response, continue the conversation. Nothing groundbreaking there.

What is a bit different is you can run conversations in separate tabs with their own separate settings and memory handling. More on that in a bit.

Apiction renders Markdown responses properly. It renders Math equations via KaTeX, code blocks with syntax highlighting, so you can have formatted responses without needing to open them in another app. Just make sure to set the system prompt to ask the model to respond in Markdown format.

Streaming is supported. So responses appear word by word as they generate, however the rendering during stream is kept lightweight and the full Markdown render only happens when the complete response is received.

2. Image Generation

Apiction supports image generation from any OpenAI-compatible API platform. For some other platforms, like Pollinations (though their recent updates now enable image generation via OpenAI-compatible endpoints), the image generation is supported via custom adapters.

You switch the model type in the tab settings to text-to-image, configure the url endpoint and API key, and start generating.

3. Web Search and Tool Calling via MCP Servers

Apiction integrates with the Model Context Protocol (MCP), a standard for connecting AI models to external tools and data sources via JSON-RPC.

In practice, this means you can point a tab at a locally running MCP server (a web search tool, a database query tool, a file reader, whatever you have set up) and the AI model will use those tools autonomously when it decides they are necessary.

It discovers the available tools at the start of a conversation, injects them into the request, and then executes the tool calls the model makes, up to five rounds of back-and-forth tool execution per message, before returning the final response to you.

As indicated previously, each tab has its own tool configuration, so one tab can have web search enabled while another has none. The model decides when to invoke a tool and when to just respond directly.

4. Converse With Any AI Model From Any Provider

This is arguably the core premise of Apiction. It natively supports OpenAI, OpenRouter, HuggingFace, Together.ai, Groq, Pollinations, Fireworks.ai, Cloudflare Workers, and anything that speaks the OpenAI-compatible API format, which covers a large chunk of the ecosystem.

For providers that do not follow that format, some providers are still supported via custom adapters and in future releases, more adapters will be added to cover other such platforms as well.

Apiction's internal architecture is built this way. The core of the application knows nothing about OpenAI or Anthropic or HuggingFace. All provider-specific logic lives in isolated adapters.

When you configure an endpoint in a tab, the adapter selection is automatic based on the URL. This means adding a new provider is a matter of writing one file without touching anything else in the codebase.

For end users, this simply means: if a new model drops somewhere, you can use it immediately, as long as you correctly configure the API key and the url endpoint.

5. Rolling Summary for Long Conversations

Token limits are a real constraint. A long conversation will eventually exceed the context window of whatever model you are using.

Apiction handles this with a rolling summarization system. After every set number of messages in a tab, the older portion of the conversation is automatically summarized using a summarization model for that tab.

When building context for the next request, that summary is injected as a system message, so the model retains the context of what happened earlier without needing the raw message history.

The summaries are cumulative, each new batch is merged into the existing summary rather than replacing it, so the context is preserved across multiple summarization passes while there exists only one combined summary. The summarization model is configurable separately and on per tab basis.

For Developers

6. Raw Request and Response Debug Console

Every tab has a debug mode that when enabled, shows you the raw request payload sent to the provider and the raw response received back. This can be valuable if you are-

Debugging a custom adapter or a new provider integration
Verifying that your system prompt and message history are being formatted correctly
Checking if tool call data is making it into the request payload

It is off by default, so it does not clutter the interface during normal use.

7. Performance Stats: TTFT, Total Time, Token Counts

Below each response, when debug mode is enabled, you get:

Time to First Token (TTFT)- how long from sending the request until the first token starts streaming back
Total response time- total time to last token
Input and output token counts- pulled from the usage field of the response

8. Cross-Provider Model Comparison

This is the most useful developer feature. If you are building an application on top of a specific model and relying on a certain level of reasoning quality, the provider you choose matters.

In the Apiction, because each tab is independently configured, you can open multiple tabs, point them at different providers, set the same model on each, and send them the same prompt. You can then compare things like response quality, total response time and TTFT, and how much the model diverges across providers.

For example, you can test and "gauge" if any provider is running a heavily quantized model than the one advertised. Quantized models can be significantly cheaper and faster, but they often have reduced reasoning capabilities and can make more mistakes, especially on complex tasks.

You can surface this with some targeted prompts. Ask the model to perform a multi-step arithmetic calculation or a logically dense reasoning problem. A full-precision model and a heavily quantized version of the same model will often diverge on these tasks or their reasoning on recurring basis.

A Note on Privacy and Setup

All your conversations, settings, and summaries are stored in your browser's IndexedDB.

The PHP backend it ships with is only there because many AI provider APIs block direct browser requests due to CORS. The backend is a thin proxy, it forwards your request, gets the response, and returns it. If a provider supports CORS directly, like Openrouter, you can enable direct requests in settings and skip the proxy entirely.

DEV Community