DEV Community

TheAIRabbit
TheAIRabbit

Posted on

A Smarter Way to Find and Test AI Models for Your App using GPT + Super AI (MCP)

Modern development tools have made building applications easier than ever. You can now launch a new app with a database, authentication, and other core features in minutes. The final piece of the puzzle, adding genuine intelligence with AI, however, introduces a new set of challenges.

Developers often face several key questions when integrating AI:

  • Which AI model provider should you choose?
  • How do you price your product to account for AI usage costs?
  • If you're using your own API key, how do you protect it from misuse and prevent unexpected expenses?

These questions become even more critical if you plan to offer a free trial or a free tier for your application. Without a proper strategy, you risk having your budget drained by overuse and users who don't intend to subscribe. While many solutions exist, one straightforward approach is to ship your product with a local AI that performs its specific task efficiently.

The Power of Local, Specialized AI

Amazing technologies are available that allow you to bundle a lightweight AI model directly with your application. This can be as simple as the snippet below, which creates a basic chatbot within a single HTML file.

Before you adopt this approach, there are two fundamental questions you need to answer:

  1. What specific use case should your model excel at? Most developers know that smaller models are not generalists like the mega-models behind services like ChatGPT. Instead, they are fast, cheap, and lightweight specialists. Your use case might be document classification, language translation, text summarization, or another focused task.
  2. Which model is the right one for that use case? After defining the task, you need to find a model that can perform it effectively.

The first requirement is a core part of any successful business plan. The second, however, can be a significant challenge when you have to choose from hundreds of available models. There are many benchmarking platforms like Hugging Face's LLM Leaderboard, LMSys's Chatbot Arena, and Artificial Analysis, plus countless online playgrounds to test individual models. But sifting through them all takes time.

Automating Model Discovery with AI

If you have a handful of use cases and need to iterate quickly, you can use AI an Super AI MCP to automate the discovery and testing process. Here’s how it works:

  1. Configure an AI to access benchmark data. This gives your AI assistant the information it needs to compare models.
  2. Configure the AI to access prediction platforms. This connects your AI to services that host a wide variety of models.
  3. Provide your use cases in natural language. Let the AI find the most suitable models and run tests for you.

To make this work, you only need two key components:

  • Any chatbot that supports the Model Context Protocol (MCP), such as ChatGPT, Claude, and others.
  • A free account at Apify.com to access benchmark data using a specific MCP. (Requires an API key).
  • (Optional) A Replicate account if you want to run predictions. (Requires an API key).

You can then use a prompt like this:

Find the best 3 small models that can do this task and try them out on Replicate: 

--- my task 1 here 
--- my task 2 here 
etc..
Enter fullscreen mode Exit fullscreen mode

Let’s walk through an example.

Prerequisites:

  • ChatGPT (or another chatbot with MCP support)
  • An Apify API key (a free account is sufficient)
  • A Replicate API key (this is a pay-per-use service)

Step-by-Step Guide to Automated Model Testing

Step 1: Configure the MCP Server

First, you need to connect your chatbot to the benchmark and prediction tools using an MCP server.

Start by adding a new MCP in your chatbot's settings.

You will need to provide the server URL.

Use the following URL, adding your Replicate API key at the end where indicated.

https://flamboyant-leaf--super-ai-bench-mcp.apify.actor/mcp?replicateApiKey=

Leave the OAuth section empty, as you will authenticate with Apify later. Click confirm to save.

That's it for the configuration.

Step 2: Find and Analyze Suitable Models

Now, let's try a simple example to find some high-value small models. Later, you can replace this with your own specific use cases.

In your chatbot, enter the following prompt:

Find the best small model
Enter fullscreen mode Exit fullscreen mode

ChatGPT will now ask the benchmark tool for suitable models and sort them based on the request.

Here, it has found several models, including different versions of Llama, Qwen, and Phi, along with necessary data like size and cost.

The AI then provides a quick recommendation of which models to use.

Step 3: Test the Models on Replicate

This is useful, but the real power comes from seeing the models execute your use case. Here, we'll let the AI create and run a simple coding task.

Use the following prompt:

try them on replicate
Enter fullscreen mode Exit fullscreen mode

The AI will first search for suitable models available on the Replicate platform. Note that not all models listed in benchmarks are on Replicate, but in this case, they are.

Now, we can run the test on all of them simultaneously.

You can see the jobs running in your Replicate dashboard, with details including creation date, duration, and more. Your AI also has access to this data.

https://replicate.com/predictions

After approximately one to two minutes, our use case has been tested across five different models, and we receive a detailed analysis directly from the AI.

Real-World Applications and Benefits

This was a very simple example. In a real-world scenario, you can:

  • Provide your own complex, specific use cases for testing.
  • Save the results for future comparison.
  • Evaluate new models as they are released without switching between different platforms.
  • Distribute complex tasks across multiple models to leverage their unique strengths.
  • And much more.

While all of these capabilities are valuable, the greatest benefit is the ability to quickly compare results from different models without subscribing to multiple services. As mentioned at the beginning of this post, this process makes it significantly easier to find small, efficient models that you can confidently ship with your products.

Appendix

Pure HTML/JS Chatbot (Snippet)

Open your Chrome browser and enable the on-device model at

chrome://flags/#optimization-guide-on-device-model
Enter fullscreen mode Exit fullscreen mode

Then save this HTML file and just open it. The rest is self-explanatory.

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8" />
  <meta name="viewport" content="width=device-width,initial-scale=1" />
  <title>Local LLM Chat (Browser)</title>
  <style>
    :root { color-scheme: dark; }
    body { margin: 0; font: 14px/1.4 system-ui, -apple-system, Segoe UI, Roboto, Arial; background:#0b0f14; color:#e6edf3; }
    .wrap { max-width: 980px; margin: 0 auto; padding: 16px; display:flex; flex-direction:column; gap:12px; height: 100vh; box-sizing:border-box; }
    .top { display:flex; gap:10px; align-items:center; flex-wrap:wrap; }
    .chip { padding:6px 10px; border:1px solid #223; border-radius:999px; background:#0f1621; }
    .status { opacity:.9; }
    .chat { flex:1; overflow:auto; border:1px solid #223; border-radius:12px; padding:12px; background:#0f1621; }
    .msg { margin: 0 0 10px 0; white-space:pre-wrap; }
    .msg .role { font-weight:700; }
    .msg.user .role { color:#7ee787; }
    .msg.ai .role { color:#79c0ff; }
    .row { display:flex; gap:10px; }
    input, select {
      padding:10px; border-radius:10px; border:1px solid #223;
      background:#0b0f14; color:#e6edf3;
    }
    #inp { flex:1; }
    button { padding:10px 12px; border-radius:10px; border:1px solid #223; background:#1f6feb; color:#fff; cursor:pointer; }
    button.secondary { background:#0f1621; }
    button:disabled { opacity:.5; cursor:not-allowed; }
    .small { font-size: 12px; opacity:.8; }
    .hide { display:none; }
  </style>
</head>
<body>
  <div class="wrap">
    <div class="top">
      <span class="chip">Transformers.js (browser local)</span>

      <label>
        Model:
        <select id="modelSelect">
          <option value="HuggingFaceTB/SmolLM2-135M-Instruct">SmolLM2-135M-Instruct (recommended)</option>
          <option value="HuggingFaceTB/SmolLM2-360M-Instruct">SmolLM2-360M-Instruct (bigger)</option>
          <option value="HuggingFaceTB/SmolLM2-1.7B-Instruct">SmolLM2-1.7B-Instruct (heavy)</option>
          <option value="__custom__">Custom model id…</option>
        </select>
      </label>

      <input id="customModel" class="hide" placeholder="e.g. Org/RepoName" size="28" />

      <button id="loadBtn" type="button">Load</button>
      <button id="clearBtn" type="button" class="secondary" disabled>Clear</button>

      <span class="status" id="status">Not loaded.</span>
    </div>

    <div class="chat" id="chat"></div>

    <div class="row">
      <input id="inp" placeholder="Type a message and press Enter…" disabled />
      <button id="sendBtn" type="button" disabled>Send</button>
    </div>

    <div class="small">
      If opening as <code>file://</code> blocks module imports on your machine, run a local server:
      <code>python -m http.server 8000</code> then open <code>http://localhost:8000</code>.
      First load downloads the model (can be large).
    </div>
  </div>

  <script type="module">
    const $ = (id) => document.getElementById(id);
    const chatEl = $("chat");
    const statusEl = $("status");
    const inp = $("inp");
    const sendBtn = $("sendBtn");
    const clearBtn = $("clearBtn");
    const loadBtn = $("loadBtn");
    const modelSelect = $("modelSelect");
    const customModel = $("customModel");

    function escapeHtml(s) {
      return String(s).replace(/[&<>"']/g, (c) => ({
        "&":"&amp;","<":"&lt;",">":"&gt;",'"':"&quot;","'":"&#39;"
      }[c]));
    }

    function addMsg(role, text) {
      const div = document.createElement("div");
      div.className = `msg ${role}`;
      div.innerHTML = `<span class="role">${role === "user" ? "You" : "AI"}:</span> ${escapeHtml(text)}`;
      chatEl.appendChild(div);
      chatEl.scrollTop = chatEl.scrollHeight;
    }

    function setUiLoaded(loaded) {
      inp.disabled = !loaded;
      sendBtn.disabled = !loaded;
      clearBtn.disabled = !loaded;
    }

    modelSelect.addEventListener("change", () => {
      const isCustom = modelSelect.value === "__custom__";
      customModel.classList.toggle("hide", !isCustom);
    });

    // Chat state
    let generator = null;
    let deviceUsed = "";
    const system = "System: You are a helpful assistant. Be concise.\n";
    let transcript = "";

    function resetChat() {
      transcript = "";
      chatEl.innerHTML = "";
      addMsg("ai", "Ready. Ask me a question.");
      inp.focus();
    }

    async function loadModel() {
      try {
        setUiLoaded(false);
        loadBtn.disabled = true;
        statusEl.textContent = "Loading library…";

        const { pipeline, env } = await import(
          "https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.2/+esm"
        );
        env.useBrowserCache = true;

        let modelId = modelSelect.value;
        if (modelId === "__custom__") modelId = customModel.value.trim();
        if (!modelId) throw new Error("No model id provided.");

        const make = async (device) => pipeline("text-generation", modelId, {
          dtype: "q4",
          device,
          progress_callback: (p) => {
            if (p && p.status === "progress") {
              const pct = (typeof p.progress === "number") ? ` ${p.progress.toFixed(1)}%` : "";
              statusEl.textContent = `Downloading ${p.file || ""}${pct}`.trim();
            }
          },
        });

        try {
          statusEl.textContent = "Initializing WebGPU…";
          generator = await make("webgpu");
          deviceUsed = "webgpu";
        } catch (e) {
          statusEl.textContent = "WebGPU failed, using WASM…";
          generator = await make("wasm");
          deviceUsed = "wasm";
        }

        statusEl.textContent = `Loaded ${modelId} (${deviceUsed}).`;
        setUiLoaded(true);
        resetChat();
      } catch (e) {
        console.error(e);
        statusEl.textContent = `Load failed: ${e.message || e}`;
        addMsg("ai", "Load failed. Check console. If using file:// and imports are blocked, run via a local server.");
        generator = null;
        deviceUsed = "";
        setUiLoaded(false);
      } finally {
        loadBtn.disabled = false;
      }
    }

    async function send() {
      if (!generator) return;

      const user = inp.value.trim();
      if (!user) return;

      inp.value = "";
      addMsg("user", user);

      transcript += `User: ${user}\nAssistant:`;
      statusEl.textContent = "Thinking…";
      sendBtn.disabled = true;
      inp.disabled = true;

      try {
        const out = await generator(system + transcript, {
          max_new_tokens: 160,
          temperature: 0.7,
          return_full_text: false
        });

        const r = Array.isArray(out) ? out[0] : out;
        const aiText = (r && r.generated_text != null) ? String(r.generated_text).trim() : "";
        transcript += ` ${aiText}\n`;

        addMsg("ai", aiText || "(no output)");
        statusEl.textContent = `Loaded (${deviceUsed}).`;
      } catch (e) {
        console.error(e);
        statusEl.textContent = "Generation error (see console).";
        addMsg("ai", "Error generating response. See console.");
      } finally {
        sendBtn.disabled = false;
        inp.disabled = false;
        inp.focus();
      }
    }

    sendBtn.addEventListener("click", send);
    inp.addEventListener("keydown", (e) => { if (e.key === "Enter") send(); });
    clearBtn.addEventListener("click", resetChat);
    loadBtn.addEventListener("click", loadModel);

    // Optional: auto-load on open
    // loadModel();
  </script>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Top comments (0)