DEV Community

Vrushank for Portkey

Posted on

Open Source Models for your next AI use-case

Open-source models offer greater control, tools for quality improvement, and cost savings. They are becoming a favored choice for developers to consider for their apps. This blog looks at popular models and the most straightforward way to use them in your apps: by consuming Inference endpoints.

Inference APIs allow you to generate predictions, decisions, or outputs from a trained AI model. Providers like Anyscale, Perplexity & TogetherAI expose open-source models through Inference endpoints for your apps to make API calls and get responses from LLMs. We will use Portkey SDK to make API calls to inference endpoints since it follows the OpenAI signature, which allows us to easily switch between LLMs by doing find & replace model and endpoint URLs in our app.

Plan a birthday party.

Consider you are building an app that suggests to the user some steps to plan a birthday party. It should give users a checklist of items to take care of to organize a successful birthday party.

Here's how your implementation might look like:

  1. Construct a prompt properly with a suitable system & user query.
  2. Make an API call to the LLM.
  3. Transform the response to be suitable for our app.

The control panel would be Portkey. To avoid managing multiple API keys (Anyscale, Perplexity, and TogetherAI) securely save them to the vault using virtual keys.

Import portkey-ai and phrase your prompt

import Portkey from "portkey-ai";

let portkey = new Portkey({
    apiKey: process.env.PORTKEYAI_API_KEY,
    virtualKey: process.env.TOGETHERAI_VIRTUAL_KEY,// or Anyscale or Perplexity
});

const messages = [{
        role: "system",
        content: "You are very good program manager and have organised many events before. You can break every task in simple and means for others to pick it up. You each step as short as possible. Keep the response under 1000 words.",
    },
    {
        role: "user",
        content: "Help me plan a birthday party?",
    },
];
Enter fullscreen mode Exit fullscreen mode

The chat completions call to llama-2-70b-chat

  var response = await portkey.chat.completions.create({
      messages,
      model: "togethercomputer/llama-2-70b-chat",
      max_tokens: 1000,
  });
  console.info(response.choices[0].message.content);
Enter fullscreen mode Exit fullscreen mode

You will notice logs for every request sent on the portkey’s control panel with useful data such as the timestamp, request type, LLM used, tokens generated, and cost.

When instantiating the portkey, use virtual keys of different LLM providers, such as Perplexity or Anyscale, to choose one of them. Select any model from these providers and pass its name to make the chat completion call. See the complete list of models supported through Portkey.

Inference Engines

Different LLM providers can provide the same model to our applications. For example, Llama 2 is available on Anyscale and TogetherAI. Although they have the same models, their inference engines are different. Inference engines handle all our app's API calls; henceforth, we use inference endpoints. They are optimized for performance and quality. It’s important for you to consider the differences in the Inference engines as you finalize your best-suited model.

Summary

Suggestion to plan a birthday party? You can choose different LLMs through inference engines of various LLM providers. We explored how query, prompt, LLM, LLM provider, and Portkey make a chat completion call. Have fun experimenting with multiple prompts, language models, and features available now!

Top comments (0)