DEV Community

Cover image for ⚡️ FREE Llama 2 API: OpenAI-compatible API for Workers AI 🧠
Chandler
Chandler

Posted on

⚡️ FREE Llama 2 API: OpenAI-compatible API for Workers AI 🧠

I am a huge fan of OpenAI, to the surprise of no one. I use their API as well as ChatGPT all the time for work, personal projects, and daily tasks alike. However, I think that Open Source is the future and should be the goal if we are to make Generative AI accessible to the masses.

Recently Cloudflare unveiled Workers AI. Workers AI is a way to run GPU heavy AI tasks on Cloudflare's global network. For their initial beta offering, Cloudflare is allowing users to try out a small subset of available models, with one for each general category of AI technology.

AI offerings as of publishing

Seeing what was offered got me thinking: What if I could host my own OpenAI compatible API on Cloudflare Workers? That way I could quickly use and try out new models (when they become available) while still using the existing SDKs and tooling!

Introducing OpenAI for Workers AI! This will allow myself and others the ability to use our existing tooling with Cloudflare's global Workers AI network!

Deploying

First, clone the repository.

git clone https://github.com/chand1012/openai-cf-workers-ai
cd openai-cf-workers-ai
Enter fullscreen mode Exit fullscreen mode

Then, install the dependencies and deploy to your account. If you are not logged in to wrangler, you will be prompted to log in.

yarn
yarn deploy
Enter fullscreen mode Exit fullscreen mode

As of 07/10/2023 testing locally does not work. Deployment only takes a few seconds, so it is recommended to deploy and test on the deployed worker.

Usage

See the OpenAI API docs for more information on the API. Here's an example from the OpenAI docs:

curl https://openai-cf.yourusername.workers.dev/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "@cf/meta/llama-2-7b-chat-int8",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'
# {"id":"ccfbc7fc-d871-4139-90dc-e6c33fc7f275","model":"@cf/meta/llama-2-7b-chat-int8","created":1696701894,"object":"chat.completion","choices":[{"index":0,"message":{"role":"assistant","content":"Hello there! *adjusts glasses* It's a pleasure to meet you. Is there something I can help you with or would you like to chat? I'm here to assist you in any way I can. 😊"},"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
Enter fullscreen mode Exit fullscreen mode

If you want to use this with the OpenAI Python or JavaScript SDK, you can use the following code, replace the base URL with your own. For example:

import openai
openai.api_base = 'https://openai-cf.yourusername.workers.dev/'

# rest of code
Enter fullscreen mode Exit fullscreen mode
import OpenAI from 'openai';

const openai = new OpenAI({
    baseURL: 'https://openai-cf.yourusername.workers.dev/',
    ...
});

// rest of code
Enter fullscreen mode Exit fullscreen mode

Compromises

There were a few compromises I had to make in order to create the API.

The first is that the API does not count tokens, and will always return zero for the usage attribute in the return object. It will always return it for compatibility reasons, but until tokenization is added for the respective model, we cannot count tokens. Each model tokenizes differently, so we can't use tiktoken. It may be possible to tokenize using HuggingFace transformers, but that may take too long and not allow free users to deploy the API. More investigation is needed.

The second is the model selection. If you look in the code, you'll notice its commented out. In the future Cloudflare will be adding the ability to use different models with the API, but for now to keep it simple it will always use the only available model. Once more models are added, the API will be updated to allow for model selection.

Stop tokens are also non-functional. There is no way to specify a stop reason or token with the current API. It will be ignored.

Finally, for simplicity's sake, there is no API key functionality. Because the current rate limits (as of 07/10/2023) are rather strict for Cloudflare AI anyways, I decided not to count or limit requests. In the future when we can count tokens this may change, or we may limit per request instead of per token.

Conclusion

In summary, Cloudflare's recent unveiling of Workers AI, an innovative platform for running GPU intensive AI tasks on a global network, has opened up exciting avenues for tech enthusiasts like myself. Inspired by the potential, I ventured into developing 'OpenAI for Workers AI' – a bridge that allows users to integrate OpenAI compatible API with Cloudflare Workers. While this provides a robust platform for leveraging the capabilities of Workers AI using familiar OpenAI tooling, it's essential to note certain limitations. For instance, the API does not count tokens, model selection is limited, stop tokens are non-functional, and there's no API key functionality as of yet. However, these are not roadblocks but stepping stones, signaling future developments and improvements. As AI evolves and more models become accessible, the integration between OpenAI and Cloudflare Workers will undoubtedly become more versatile, efficient, and user-friendly. If you're an AI aficionado keen on harnessing the best of both worlds, give it a whirl and dive into the endless possibilities. Stay tuned for updates!

Top comments (0)