DEV Community: alisdairbr

Best Open Source LLMs in 2025

alisdairbr — Fri, 14 Feb 2025 13:23:35 +0000

Open source LLMs continue to compete with proprietary models on performance benchmarks for natural language tasks like text generation, code completion, and reasoning.
Despite having fewer resources than closed models, these open LLMs offer cutting-edge AI without the high costs and restrictions of proprietary models.

However, running these open-source models in production and at scale remains a challenge. Enter Serverless GPUs: a cost-effective, scalable way to deploy and fine-tune LLMs without managing complex infrastructure.

In this blog post, we’ll explore the best open LLMs available at the start of 2025, including: DeepSeek-R1, Mistral Small 3, and Qwen 2.5 Coder. After comparing their capabilities and ideal use cases for real-world AI applications, we’ll also share how to fine-tune and deploy them using serverless GPUs for optimized inference and training.

DeepSeek-R1 Qwen 32B

DeepSeek released two first-generation reasoning models: DeepSeek-R1-Zero and DeepSeek-R1.
DeepSeek-R1-Zero was trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT), allowing it to explore chain-of-thought (CoT) reasoning for complex problem-solving.

Although this approach led to impressive advancements, DeepSeek-R1-Zero faced challenges, such as: repetition, poor readability, and language mixing. To improve performance, DeepSeek developed DeepSeek-R1, with cold-start data incorporated before RL.

In addition to these two models, DeepSeek released six models of varying sizes based on Llama and Qwen, including DeepSeek-R1-Distill-Qwen-32B.

Distilled models are smaller models that have been trained with the reasoning patterns of larger, more complex models.

Model Provider: DeepSeek
Model Size: 32B
Context Length: 131K tokens
Comparison to Proprietary Models: DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks. Explore available benchmarks
Skills: Strong in reasoning, mathematical reasoning, and general natural language tasks
Languages Supported: Primarily trained in English and Chinese
License: Apache 2.0

Deploy DeepSeek R1 on Koyeb →

Mistral Small 3

Mistral AI is a leading provider for AI models, including multimodal models like Pixtral 12B and Large, edge models such as Ministral 3B and 8B, LLMs like Nemo Instruct, Codestral for code generation, Mathstral for mathematics, and more.

Released in January 2025, Mistral Small 3 Instruct is a 24-billion-parameter model that achieves state-of-the-art capabilities comparable to larger models. It is ideal for various text generation tasks, including fast-response conversational agents, low-latency function calling, and any other applications requiring robust language understanding and instruction-following performance.

This model is an instruction-fine-tuned version of the base model: Mistral-Small-24B-Base-2501.

Model Provider: Mistral AI
Model Size: 24B parameters
Context Window: 32K tokens
Comparison to Proprietary Models: Competitive with larger models like Llama 3.3 70B and Qwen 32B. Explore available benchmarks
Skills: Strong at summarization, conversational AI, multilingual tasks, and creating highly accurate subject matter experts for specific domains
Languages Supported: English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, Polish and more
License: Apache 2.0

Deploy Mistral Small 3 on Koyeb →

Qwen 2.5 Coder 7B Instruct

Qwen2.5 is a new family of models from Qwen that includes Qwen2.5 LLMs, and specialized models Qwen2.5-Math for mathematics and Qwen2.5 Coder for coding.

The open-source Qwen2.5 models available with an Apache 2.0 license include:

Qwen2.5: 0.5B, 1.5B, 7B, 14B, and 32B
Qwen2.5-Coder: 1.5B, 7B, and 32B
Qwen2.5-Math: 1.5B and 7B

There are also 3B and 72B variants, not available with an open-source license.

Among all the advancements in AI, code generation has been significant. Qwen 2.5 7B Coder Instruct stands out for its high performance in code tasks, including generation, reasoning, and code fixing.

Model Provider: Alibaba Cloud
Model Size: 7.61B
Context Length: 131,072 tokens
Comparison to Proprietary Models: Performs better than other open source code generation models. Competitive performance with GPT-4o. Explore available benchmarks
Skills: Code generation, code reasoning and code fixing
Languages Supported: Over 10, including Chinese, English, and Spanish
License: Apache 2.0

Deploy Qwen 2.5 Coder 7B Instruct on Koyeb →

Best Open Source Models for Reasoning, Code Generation, and More

✅ Best for reasoning → DeepSeek-R1-Distill-Qwen-32B
✅ Best for conversational AI & summarization → Mistral Small 3
✅ Best for coding → Qwen 2.5 Coder 7B Instruct

Fine-Tuning and Deploying Open LLMs with Serverless GPUs

Open-source AI models like DeepSeek-R1, Mistral Small 3, and Qwen 2.5 Coder provide powerful alternatives to proprietary options, offering flexibility and cost-effectiveness.

With Koyeb’s serverless GPUs, you can fine-tune and deploy these models with a single click. Get a dedicated inference endpoint running on high performance GPUs without managing any infrastructure.

Explore the one-click deploy catalog
Deploy vLLM, Ollama, and other open-source models like Flux.1 [dev] and Phi-4
Read our documentation
Sign up for Koyeb to get started deploying serverless inference endpoints today

Use FLUX, PyTorch, and Streamlit to Build an AI Image Generation App

alisdairbr — Tue, 03 Dec 2024 10:56:18 +0000

The need for AI-generated images has been growing rapidly in recent years. These images are not only used for artistic purposes, but also for practical applications in various industries. For example, in the fashion industry, AI-generated images can be used to create virtual models for showcasing clothing. In the automotive industry, AI-generated images can be used for designing and testing new car models. And the best part? You can now run your own AI image generation machine on Koyeb GPUs.

The FLUX.1 [dev] model (by BlackForestLabs) is an advanced AI image generation model that produces outstanding output quality. It is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. It features competitive prompt following, and is trained using guidance distillation. Additionally, the generated outputs can be utilized for personal, scientific, and non-commercial purposes as outlined in the FLUX.1 [dev] Non-Commercial License.

In this tutorial, we will learn how to set up a Streamlit application, integrate the FLUX model for real-time image generation, and deploy the application using Docker and Koyeb, ensuring a scalable image generation service.

You can deploy the FLUX application as built in this tutorial using the Deploy to Koyeb button below:

Requirements

To successfully follow this tutorial, you will need the following:

Git installed
Python 3.6+ or later
A Hugging Face account
A Koyeb account

Understanding of the components

Text-To-Image Generation and FLUX Model

The process of text-to-image generation involves the model interpreting input text and translating it into visual representations. This process consists of several steps: first, the model encodes the textual input into a latent space, capturing the semantic meaning of the words. Next, it employs a generative process to sample from this latent space, producing images that align with the described concepts. The FLUX model, developed by BlackForestLabs, is a state-of-the-art 12 billion parameter rectified flow transformer trained on extensive datasets containing pairs of images and their corresponding textual descriptions. This training enables the model to learn the intricate relationships between language and visual content. Consequently, users can input detailed prompts, and the FLUX model generates images that reflect those prompts with accuracy (and creativity).

Streamlit

Streamlit is an open-source Python library designed to create interactive data applications, often referred to as dashboards. It empowers developers to build and share data apps simply and intuitively, eliminating the need for extensive web development expertise.

Streamlit apps are created as Python scripts, which are then executed within the Streamlit environment. The library offers a set of functions that can be utilized to add interactive elements to the app such as upload file button.

Steps

Set up the environment: Start by setting up your project directory, installing necessary dependencies, and configuring environment variables.
Set up Streamlit: Next, install Streamlit and create the initial user interface for your application.
Generate AI Images with FLUX Model: Using the FLUX model to generate AI images based on the user prompt and configuration.
Dockerize the Streamlit application: Create a Dockerfile to containerize your application for consistent deployment.
Deploy to Koyeb: Finally, deploy your application on the Koyeb platform.

Set up the environment

Let's start by creating a new Streamlit project. To keep your Python dependencies organized you should create a virtual environment.

First, create and navigate into a local directory:

# Create and move to the new directory
mkdir example-koyeb-flux-images
cd example-koyeb-flux-images

Afterwards, create and activate a new virtual environment:

# Create a virtual environment
python -m venv venv

# Active the virtual environment (Windows)
.\venv\Scripts\activate.bat

# Active the virtual environment (Linux)
source ./venv/bin/activate

Now, create a requirements.txt file with the following dependencies:

streamlit
watchdog
diffusers
torch
torchvision
einops
huggingface_hub[hf_transfer]
safetensors
sentencepiece
transformers
tokenizers
protobuf
requests
invisible-watermark
accelerate
peft

In terms of dependencies, we have included Streamlit for building a web app in Python, Hugging Face for real-time local use of the FLUX model, and watchdog to monitor file system events. The accelerate package is used to streamline the training and inference processes of machine learning models, making it easier to manage distributed training and optimize performance. The invisible-watermark package is utilized to embed invisible watermarks in generated content, ensuring uniqueness of the outputs produced by the model.

Now, you can install the dependencies with the following command:

pip install -r requirements.txt

Now, let's move on to creatng a new Streamlit project.

Set up Streamlit

In this step, you will set up the Streamlit UI that will define the visual layout of the page, and the ability for users to load the FLUX model to start generating images with AI. All the logic for the project will reside in this file, so you can start by creating an app.py file with the following code:

# File: app.py

import os, torch, streamlit
from diffusers import FluxPipeline
from huggingface_hub import login, snapshot_download

# Log in to Hugging Face using the provided token from environment variables
login(token=os.getenv('HF_TOKEN'))

# Set the title of the Streamlit application
streamlit.title("AI Image Generation with FLUX.1-dev")

# Create a text input field for the user to enter a prompt for image generation
prompt = streamlit.text_input("Enter your prompt:", "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k")

# Create four columns for user inputs
col1, col2, col3, col4 = streamlit.columns(4)

# Column 1: Input for image width
with col1:
    num_width = streamlit.number_input("Width:", min_value=1, value=256)

# Column 2: Input for image height
with col2:
    num_height = streamlit.number_input("Height:", min_value=1, value=256)

# Column 3: Input for the number of images to generate
with col3:
    num_images = streamlit.number_input("Images:", min_value=1, value=3)

# Column 4: Input for the number of inference steps
with col4:
    num_inference_steps = streamlit.number_input("Steps:", min_value=1, value=3)

# Button to load the model from Hugging Face
if streamlit.button("Load Model"):
    snapshot_download(repo_id="black-forest-labs/FLUX.1-dev", cache_dir="./FLUX_1_dev")

The code above does the following:

Begins with importing the Streamlit module.
Logs into Hugging Face using a token stored in environment variables, allowing access to the model repository.
Defines text values for each step and page's title and description
Creates four columns to organize user inputs for image dimensions, the number of images, and inference steps. Each column contains a number input field with specified minimum values and default values.
Creates a Load Model button to load the FLUX model from Hugging Face. When clicked, it downloads the model snapshot to a specified cache directory using snapshot_download function.

With this, you have setup a UI that is able to load the FLUX model per user request. Now, let's move on to generating AI images based on the user prompt and settings.

Generate AI Images with FLUX Model

In this step, you will invoke FLUX model to generate AI images based on the user prompt. Per the default values set in the input, the model will be generate 3 images for each prompt by user. Make the following additions in the app.py file:

# File: app.py

# Existing code

# Button to generate images using the FLUX model
if streamlit.button("Generate Image"):
    # Load the FLUX model with specified parameters
    pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, cache_dir="./FLUX_1_dev")

    # Generate images based on the user input parameters
    images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, height=num_height, width=num_width, num_images_per_prompt=num_images).images

    # Create three columns to display the generated images
    cols = streamlit.columns(3)

    # Loop through the generated images and display them in the columns
    for i, image in enumerate(images):
        cols[i % 3].image(image, caption=prompt)

The code addition above does the following:

Creates a button titled Generate Image. When clicked, it will instantiate a Flux pipeline for text-to-image generation.
Invokes the pipeline with the user configured parameters to generate the desired height and width, number of inference steps to be taken, and number of images to be generated.
Creates three columns to organize the generated images.

Now, you can run the Streamlit application with:

streamlit run ./app.py --server.port 8000

The application would now be ready on http://localhost:8000. Test the application by generating AI images by altering the default input values and your own custom prompt.

Now, let's dockerize the application to ensure consistency between multiple deployments.

Dockerize the Streamlit application

Dockerizing deployments helps by creating a consistent and reproducible environment, ensuring that the application runs the same way on any system. It simplifies dependencies management and enhances scalability, making deployments more efficient and reliable. To dockerize, create a Dockerfile at the root of your project with the following code:

FROM python:3.12 AS runner

WORKDIR /app

COPY app.py requirements.txt .

RUN pip install -r requirements.txt --root-user-action=ignore
RUN pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121 --root-user-action=ignore # [!code ++]

RUN apt-get update 
RUN apt-get install -y libsm6 libxext6 git git-lfs # [!code ++]
RUN rm -rf /var/lib/apt/lists/*

EXPOSE 8000 # [!code ++]

ENV HF_HUB_ENABLE_HF_TRANSFER=1

ARG HF_TOKEN # [!code ++]
CMD ["streamlit", "run", "./app.py", "--server.port", "8000"] # [!code ++]

Apart from the usual Dockerfile to deploy Python applications, following tweaks/additions have been made in this code:

pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121 --root-user-action=ignore to install PyTorch with CUDA support for GPU acceleration.
RUN apt-get update && apt-get install -y libsm6 libxext6 git git-lfs && rm -rf /var/lib/apt/lists/* is used to install git-lfs and git, and then clean up package lists to reduce image size.
EXPOSE 8000 is used to specify the port on which the Streamlit application will run.
CMD ["streamlit", "run", "./app.py", "--server.port", "8000"] is used to define the command to start the Streamlit app on port 8000.

With all configured, let's move on to deploy the application to Koyeb.

Deploy to Koyeb

Now that you have the application running locally you can also deploy it on Koyeb and make it available on the internet.

Create a new repository on your GitHub account so that you can push your code.

You can download a standard .gitignore file for Python from GitHub to exclude certain directories and files from being pushed to the repository:

curl -L https://raw.githubusercontent.com/github/gitignore/main/Python.gitignore -o .gitignore

Run the following commands in your terminal to commit and push your code to the repository:

git init
git add .
git commit -m "first commit"
git branch -M main
git remote add origin [Your GitHub repository URL]
git push -u origin main

You should now have all your local code in your remote repository. Now it is time to deploy the application.

Within the Koyeb control panel, while on the Overview tab, initiate the app creation and deployment process by clicking Create Web Service.

Select GitHub as the deployment method.
Select your repository from the menu. Alternatively, deploy from the example repository associated with this tutorial by entering https://github.com/koyeb/example-flux-1-pytorch in the public repository field.
In the Instance selection, select a GPU Instance.
Set your HuggingFace access token in the HF_TOKEN environment variable.
Finally, click the Deploy button.

Once the application is deployed, you can visit the Koyeb service URL (ending in .koyeb.app) to access the Streamlit application.

Conclusion

In this tutorial, you built an AI Image Generation application using FLUX dev model with Streamlit framework. During the process, you learned how to invoke the Flux pipeline in python to generate AI images on the go, customized with user fed prompt and settings, and use the Streamlit framework to quickly prototype the user interface.

Given that the application was deployed using the Git deployment option, subsequent code push to the deployed branch will automatically initiate a new build for your application. Changes to your application will become live once the deployment is successful. In the event of a failed deployment, Koyeb retains the last operational production deployment, ensuring the uninterrupted operation of your application.

Use Stable Diffusion and PyTorch to Build an Image Inpainting Service

alisdairbr — Sun, 13 Oct 2024 22:00:00 +0000

One standout feature of Firefly models from Adobe is generative fill: simply outline the area you want to modify, provide a prompt, and the model generates the content for you.

Generative fill is possible due to image inpainting, which restores missing or damaged parts of an image by using information from surrounding pixels.
Common use cases range from personal photo editing to professional image restoration in various industries.

Stable Diffusion is an open source generative AI model that creates unique photorealistic images from text and image prompts.
In this tutorial, we will create an image inpaint service using Stable Diffusion. We will deploy the endpoint to Koyeb as a web service, enabling users to upload images and receive modified versions based on their prompts.

You can consult the repository for this guide to follow along on your own. You can deploy Stable Diffusion by clicking the Deploy to Koyeb button below:

Make sure to set the grace period to 300 seconds.

By the end, you will have built and deployed your own image inpainting service that will look like this:

Requirements

Here are the following requirements for the project:

Python 3.10 or later
Pytorch 2.4
A Koyeb account

Steps

Here are the steps we will follow to build the image inpainting service:

Set up the environment: Begin by setting up your project directory and installing the necessary dependencies, including Python and Hugging Face's Diffusers library. Configure the environment variables required for the application.
Implement Inpaint: Use a pre-trained model, like Stable Diffusion, to handle the inpainting process. This function will take the user’s image and mask to generate the output.
Set up the Web Service: Use Gradio to create the user interface for the inpainting service. The web interface will allow users to upload images and masks for the inpainting process.
Dockerize the Application: Create a Dockerfile to containerize the application. This step ensures the service is portable and can be deployed consistently across environments.
Deploy to Koyeb GPU: Deploy your Dockerized inpainting service on Koyeb. This final step brings your service online and makes it accessible to users.

Overview of implementation of image inpaint service

In this section, we’ll guide you through implementing the image inpainting service and deploying it to Koyeb. Here is an overview of the approach to this project and a diagram of the application’s architecture:

Set up the project

First, let's create a project directory named inpaint-koyeb. This will be the base folder where all the code, models, and configurations for the image inpainting service will reside.

mkdir inpaint-koyeb
cd inpaint-koyeb

Create and activate a virtual environment

It's best to use a virtual environment to isolate the project dependencies and avoid conflicts with other projects. In this case, we’ll use Python 3.10

Ensure Python >=3.10 is installed. You can check your Python version by running:

python --version

Now, create and activate the virtual environment:

python -m venv env

Next, activate the virtual environment. On macOS/Linux:

source venv/bin/activate

On Windows:

.\venv\Scripts\activate

Once activated, your terminal will be prefixed with (venv), indicating the active environment.

Initialize Git repository

To track changes and manage your code efficiently, it’s a good idea to initialize your project directory as a Git repository. Run the following command:

git init

This will create a hidden .git folder in your project directory, allowing you to commit changes and work with version control.

Install requirements

Create requirements.txt. The requirements.txt file lists the dependencies your project will need. Create the file and add the following dependencies to it:

torch==2.4
diffusers==0.30.2
gradio==4.43.0
transformers==4.44.2
gradio_client==1.3.0

As we will build a Docker container later, we must add the versions to ensure consistency across environments and prevent breaking changes.

Finally, install the dependencies listed in requirements.txt using the following command:

pip install -r requirements.txt

This will ensure all necessary libraries for the inpainting service are installed within your virtual environment.

Create a `.gitignore` file

You should exclude specific files and directories, such as your virtual environment, from version control. Create a .gitignore file and add the venv folder to it to ensure Git does not track it:

# .gitignore
venv/

This will prevent your virtual environment from being uploaded to the repository.

Implement inpaint service

Huggingface has made it much easier for developers to work with the Stable Diffusion model by creating diffusers library. It allows you to quickly implement models for tasks like image generation, inpainting, and more, leveraging state-of-the-art architectures like Stable Diffusion.

In this example, we use the AutoPipelineForInpainting class from the diffusers library to create an inpainting model.

Import the necessary libraries

In an app.py file, import the necessary libraries:

import torch
from diffusers import AutoPipelineForInpainting

torch: This is the PyTorch library for tensors and GPU computations.
AutoPipelineForInpainting: A pipeline specifically designed for the task of image inpainting.

Load the pre-trained inpainting model

Next, load the pre-trained inpainting model in the app.py file:

pipeline = AutoPipelineForInpainting.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
).to("cuda")

pipeline.enable_model_cpu_offload()

Model: The model being used is stable-diffusion-xl-base-1.0, which supports inpainting. This pre-trained model means you don’t need to train it from scratch.
Inference Speed: To optimize performance, we implemented the following:
- torch_dtype=torch.float16 ensures the model uses half-precision (FP16) for computations, which speeds up inference on modern GPUs.
- to("cuda") moves the model to the GPU for faster processing. This is important for large models like stable-diffusion-xl.
Memory: To save memory, enable_model_cpu_offload() ensures the model offloads some parts of its computation to the CPU when GPU memory is full, allowing you to use larger models even on limited GPU resources.

Defining the inpainting function

Now that the model is ready, we define the function in app.py that will handle the inpainting process:

def image_inpaint(image_input, mask_input, prompt_input):
    image = pipeline(prompt=prompt_input, image=image_input, mask_image=mask_input).images[0]
    return image

The function image_inpaint is designed to take three inputs: the original image, a mask indicating the area to be filled in, and a text prompt that guides the model on how to fill in that masked region.

Inside the function, the pipeline object is used to perform the inpainting. It processes the image and mask according to the provided prompt, and then generates an output image where the masked part is filled in based on the prompt's instructions. Once the new image is created, the function returns it.

With this setup, we now have the core logic for modifying images by filling in missing parts. The next steps will cover how to create a user interface to make this functionality accessible to end users.

Set up the web service

To set up the web service, we will use Gradio. Gradio is an easy-to-use library that allows developers to create web interfaces for machine learning models and other Python functions. We will create a function that includes two image input sections, one text input section, and an output image section.

In your `app.py' file, add the following code to set up the Gradio interface:

`app.py
import gradio as gr

def image_inpaint(image_input, mask_input, prompt_input):
image = pipeline(prompt=prompt_input, image=image_input, mask_image=mask_input).images[0]
return image

def gradio_interface():
with gr.Blocks() as demo:
gr.Markdown("## Image Inpainting Service")

    with gr.Row():
        with gr.Column():
            image_input = gr.Image(type="pil", label="Upload Image")
            mask_input = gr.Image(type="pil", label="Upload Mask")
            prompt_input = gr.Textbox(label="Enter Prompt", placeholder="Describe what you want to inpaint")
            submit_btn = gr.Button("Inpaint")

        with gr.Column():
            result_output = gr.Image(type="pil", label="Inpainted Image")

    # Connect the button click event with the image inpainting
    submit_btn.click(image_inpaint, inputs=[image_input, mask_input, prompt_input], outputs=result_output)

demo.launch()

Run the Gradio interface

if name == "main":
gradio_interface()
`

Here’s an explanation of the code above:

Import Gradio: We begin by importing the gradio library to set up the interface for the image inpainting service.
Creating the Interface Function (gradio_interface): The function gradio_interface() is where we define the layout and logic of the web service.
Gradio Blocks: gr.Blocks() define the overall structure of the interface. Think of it as the container for the entire UI.
Markdown Header: gr.Markdown() adds a header describing the service: "Image Inpainting Service."
Creating Input Fields:
- Image Input: gr.Image(type="pil", label="Upload Image") allows users to upload a base image for inpainting.
- Mask Input: gr.Image(type="pil", label="Upload Mask") lets users upload a mask, which defines the areas to be inpainted.
- Prompt Input: gr.Textbox() enables users to input a description of the changes or details to be inpainted into the image.
Submit Button: gr.Button("Inpaint") adds a button to trigger the inpainting process when clicked.
Output Image: gr.Image(type="pil", label="Inpainted Image") is where the result of the inpainting operation will be displayed.
Button Click Action: submit_btn.click() connects the button to the image_inpaint function, taking the inputs (image, mask, and prompt) and displaying the resulting inpainted image in the output.
Launch the Interface: demo.launch() starts the Gradio interface, making the web service accessible.

To run the application locally, simply execute the file:

python app.py

Dockerize the application

Dockerization is an essential step for packaging and deploying modern applications. Creating a Docker container ensures your application runs consistently in any environment. Deploying the application to Koyeb during the build step further streamlines this process. Now, let’s go through the Dockerfile, which will allow you to containerize your Image Inpainting Service.

`Dockerfile
FROM python:3.10
WORKDIR /app

Copy the requirements file

COPY requirements.txt app.py

Install the dependencies

RUN pip install --upgrade pip && pip install -r requirements.txt
ENV GRADIO_SERVER_NAME="0.0.0.0”
ENV GRADIO_SERVER_PORT=${GRADIO_SERVER_PORT:-8000}
CMD ["python", "app.py"]
`

Apart from the usual dockerfile commands, these are some new additions:

CMD ["python", "app.py"]: This command tells Docker to execute the app.py file (which starts the Gradio interface) when the container is run. The CMD instruction sets the container's default command.

Deploy to Koyeb GPU

With the application now running locally, you can quickly deploy it to Koyeb and make it accessible online.

Push the code to GitHub

Since we have already set up a GitHub repository, simply commit your changes and push the code to GitHub.

To do this, run the following commands in your terminal:

bash git add app.py Dockerfile requirements.txt .gitignore git commit -m "first commit" git branch -M main git remote add origin [Your GitHub repository URL] git push -u origin main
`

Deploy the application on Koyeb

In the Koyeb control panel, go to the Overview tab. Click Create Service and select Create Web Service to begin the app creation and deployment
Choose GitHub as the deployment source
From the menu, select your repository
Under Instance Selection, choose a GPU Instance
In the Builder section, select Dockerfile
Adjust the grace period for TCP health checks to 300 seconds
Click the Deploy button to finalize the process

After deploying the application, you can track the deployment progress through the provided build and runtime logs.
Once the deployment is complete and health checks pass, your Gradio application will be up and running.

Click the provided public URL to access your live application.

Here is an example of the deployed application:

Conclusion

In this guide, you built an image inpainting service using Stable Diffusion and Gradio and deployed it on Koyeb. You learned how to set up a Python environment, implement the inpainting function with Stable Diffusion, and create a user-friendly web interface with Gradio. After containerizing the application with Docker, Koyeb enabled scalable and efficient deployment.

The models for image inpaint are computationally expensive, so we want something like Koyeb to handle the deployment and scaling.

With automatic builds and deployments managed through Git, Koyeb ensures your service remains operational, even in the event of a failed deployment, by reverting to the last stable version.

Fine-Tune MistralAI on Koyeb GPUs

alisdairbr — Fri, 11 Oct 2024 09:40:33 +0000

MistralAI is an advanced language model designed for tasks like text generation, sentiment analysis, translation, summarization, and more.

By default, MistralAI is trained on general language data. It performs even better when fine-tuned to specific domains like finance, law, or medicine.

Fine-tuning retrains the model on domain-specific data, enabling it to understand the specific terms, patterns, and concepts used in that field. For instance, in finance, supplementary information for retraining a model includes financial reports, stock market data, or legal documents.

With fine-tuning, MistralAI becomes more accurate at understanding complex financial terms, market trends, and regulatory requirements. This enhancement makes the model more adept at predicting financial outcomes, generating insightful analysis, and supporting decision-making in areas like trading or risk management.

Requirements

To successfully complete this tutorial, you will need the following:

GitHub Account: Needed for managing the fine-tuning code. Sign up at GitHub if you don’t have one.
Koyeb Account: Required for accessing Koyeb’s cloud infrastructure, including GPU resources. Create an account at Koyeb if you don’t have one.
Koyeb GPU Access: Make sure your Koyeb account has access to GPU instances for fine-tuning and to deploy GPU-enabled instances through the Koyeb dashboard.
Basic Knowledge: Familiarity with Python (running scripts, setting up virtual environments). Basic understanding of Docker.
NewsAPI.org API Key: Access to a NewsAPI API Key. This is needed to retrieve content for the financial data set.
MistralAI: Access to a Mistral AI API key. This will be needed to prepare the financial data set with AI.

Steps

This tutorial is divided into the following steps:

Cloning and Exploring the GitHub Repository
Understanding the Fine-Tuning Workflow
Preparing the Financial Dataset
Preparing Training and Evaluation Datasets
Configure the Training Script
Deploying to Koyeb GPU
Running the Fine-Tuning process
Evaluating the Fine-Tuned Model

Cloning and Exploring the GitHub Repository

To start fine-tuning MistralAI, the first step is to clone the official GitHub repository, which has all the necessary scripts and settings for training.

Clone the repository with the following command:

git clone https://github.com/mistralai/mistral-finetune.git

Exploring the Key Files and Folders

After cloning the repository, take a moment to look around its structure. Understanding these files will help you customize and run the fine-tuning process effectively.

The file example/7B.yaml is particularly important:

This configuration file defines the model architecture and settings such as batch size, and number of training cycles.
It is crucial for setting up the training environment and should be reviewed and adjusted based on your specific needs, especially if you are fine-tuning for a specialized area like finance.

Other important files are:

validate_data.py:
- This script is used to validate the dataset before training. It ensures that the data is complete, correctly formatted, and free of errors that could impact the training process.
- Running this script helps identify and resolve any issues with the dataset, ensuring smooth training.
reformat_data.py:
- This script is used to reformat the dataset if necessary. It ensures that the data is in the correct format required by the model for training.
- This step is important to maintain consistency and accuracy in the dataset, which is essential for effective fine-tuning.

Understanding and properly configuring the 7B.yaml file is essential for effective fine-tuning of the MistralAI model. We will see later on the necessary settings for our fine-tune process for the Mistral 7B model.

Understanding the Fine-Tuning Workflow

Fine-tuning a language model like MistralAI involves a systematic workflow to ensure the model is properly adapted to a specific domain or task.

1. Prepare the Dataset

Gather the Data (Content): Collect domain-specific data relevant to the task you are fine-tuning the model for. This could include financial reports, market data, customer reviews, or any other textual data that reflects the language and concepts you want the model to learn. Ensure the data is comprehensive and diverse enough to cover different scenarios and contexts within the domain.
Proper Formatting: Format the collected data into a structure that the model can process. This typically involves organizing the text into a sequence of interactions (e.g., question-answer pairs) or continuous text segments. Ensure consistency in formatting across all data samples to avoid confusion during training. This might include tokenization, lowercasing, and handling special characters or symbols.

2. Prepare Training and Evaluation Datasets

Splitting the Original Dataset: Divide the prepared dataset into two parts: the training set and the evaluation (or validation) set. The training set is used to teach the model, while the evaluation set is used to monitor performance and avoid overfitting. A common split ratio is 80/20 or 90/10, but this can be adjusted based on the size of your dataset and the specific requirements of your task. Ensure that both sets are representative of the full dataset, covering all relevant aspects of the domain.
Balancing the Datasets: Check for balance in the training and evaluation datasets. For example, if the data includes different categories (e.g., different financial instruments or market conditions), ensure that each category is well-represented in both the training and evaluation sets. This step is crucial to avoid bias in the model's predictions.

3. Configure the Training Script

Set Batch Size: Batch size determines how many samples are processed before the model's weights are updated. Larger batch sizes can make training faster but require more memory, while smaller batches can lead to better generalization but might make training slower. Experiment with different batch sizes to find the optimal setting for your hardware and data.
Define Training Steps: Specify the number of training steps or epochs. This determines how many times the model will iterate over the entire training dataset. Monitor the model's performance during training to decide whether more or fewer steps are needed.

4. Verify the Dataset (Training + Evaluation)

Check Data Integrity: Verify that the data in both training and evaluation sets is complete and correctly formatted. Look for missing values, corrupted files, or inconsistencies that could impact training. Run preliminary checks to ensure that the data loads correctly into the training pipeline.
Validate Data Distribution: Confirm that the distribution of data in the training and evaluation sets aligns with the expected real-world distribution. This is especially important in domains like finance, where different market conditions need to be represented.

5. Train the Model

Initiate Training: Begin the fine-tuning process by running the configured training script.
Monitor Performance: Regularly evaluate the model's performance on the validation set during training.
Save Checkpoints: Save model checkpoints at regular intervals to preserve the model’s state at different points in training.

Preparing the Financial Dataset

To prepare the dataset for training, it needs to be structured in a specific format that the MistralAI fine-tune process can understand. The format typically follows a structure similar to this:

{
  "messages": [
    {
      "role": "user",
      "content": "User interaction n°1 contained in document n°1"
    },
    {
      "role": "assistant",
      "content": "Bot interaction n°1 contained in document n°1"
    },
    {
      "role": "user",
      "content": "User interaction n°2 contained in document n°1"
    },
    {
      "role": "assistant",
      "content": "Bot interaction n°2 contained in document n°1"
    }
  ]
}
{
  "messages": [
    {
      "role": "user",
      "content": "User interaction n°1 contained in document n°2"
    },
    {
      "role": "assistant",
      "content": "Bot interaction n°1 contained in document n°2"
    },
    {
      "role": "user",
      "content": "User interaction n°2 contained in document n°2"
    },
    {
      "role": "assistant",
      "content": "Bot interaction n°2 contained in document n°2",
      "weight": 0,  # don't train on n°2
    },
    {
      "role": "user",
      "content": "User interaction n°3 contained in document n°2"
    },
    {
      "role": "assistant",
      "content": "Bot interaction n°3 contained in document n°2"
    }
  ]
}

Each JSON object contains a list of messages, with each message having a role field to indicate the speaker (either "user" or "assistant") and a content field to store the text of the interaction.

This file type is called a JSONL (JSON lines files), because it contains several JSON objects separated by a newline.

Preparing Training and Evaluation Datasets

To gather the data (content), we will use an API from NewsAPI.org to get financial news content. You will need to register to get a free API key from NewsAPI.org if you don’t have one.

You will also need access to an API key from MistralAI. You can register for one here if you don’t have one.

Then you can write the dataset_processing.py script.

import json
import requests
import pandas as pd
import os
from decouple import config
from mistralai import Mistral

NEWS_API_KEY = config("NEWS_API_KEY")
MISTRAL_API_KEY = config("MISTRAL_API_KEY")

# Function to fetch financial news data related to a specific topic
def fetch_financial_news(api_key, query="financial market", page_size=100):
    url = f"https://newsapi.org/v2/everything?q={query}&pageSize={page_size}&apiKey={api_key}"
    response = requests.get(url)
    news_data = response.json()
    articles = news_data.get('articles', [])
    return articles

# Function to process and save the data to a CSV file
def save_news_to_csv(articles, topic):
    # Create a DataFrame from the articles
    data = {
        "source": [article['source']['name'] for article in articles],
        "author": [article['author'] for article in articles],
        "title": [article['title'] for article in articles],
        "description": [article['description'] for article in articles],
        "url": [article['url'] for article in articles],
        "publishedAt": [article['publishedAt'] for article in articles],
        "content": [article['content'] for article in articles],
    }
    df = pd.DataFrame(data)

    # Save the DataFrame to a CSV file
    output_file = os.path.join(f"{topic.replace(' ', '_')}_news.csv")
    df.to_csv(output_file, index=False)
    print(f"Data saved to {output_file}")

def process_csv_to_jsonl(input_file):
    try:
        # Read the CSV file
        df = pd.read_csv(input_file)
        # Process each row and append each content to a list
        json_list = []
        for index, row in df.iterrows():
            client = Mistral(api_key=MISTRAL_API_KEY)
            result = client.chat.complete(model="mistral-small-latest", messages=[
                {
                    "content": """You will receive news article. Analyze it and generated user and assistant
                                interactions in a chat like format. The output should be a JSON with the
                                following format:
                               '{
                                  "messages": [
                                    {
                                      "role": "user",
                                      "content": "User interaction n°1 contained in document n°1"
                                    },
                                    {
                                      "role": "assistant",
                                      "content": "Bot interaction n°1 contained in document n°1"
                                    },
                                    {
                                      "role": "user",
                                      "content": "User interaction n°2 contained in document n°1"
                                    },
                                    {
                                      "role": "assistant",
                                      "content": "Bot interaction n°2 contained in document n°1"
                                    }
                                  ]
                                }'
                               Return only the JSON and nothing else, and no JSON tags.
                               """
                    ,
                    "role": "system",
                },
                {
                    "content": f"Here is the news content: {row['content']}",
                    "role": "user",
                },
            ])
            result_txt = result.choices[0].message.content
            result_txt = result_txt.replace("```

json", "").replace("

```", "")
            json_content = json.loads(result_txt)
            print(json_content)
            json_list.append(json_content)
        # Save the list to a JSONL file
        output_file = input_file.replace(".csv", ".jsonl")
        with open(output_file, "w") as f:
            for item in json_list:
                f.write(json.dumps(item) + "\n")
    except Exception as e:
        print(e)
        return "", ""

def separate_jsonl_train_eval(input_file):
    try:
        # Read the JSONL file and separate into training and evaluation sets using pandas
        df = pd.read_json(input_file, lines=True)
        df_train = df.sample(frac=0.90, random_state=200)
        df_eval = df.drop(df_train.index)

        # Save the training and evaluation sets to JSONL files keeping the same file name
        train_output_file = input_file.replace(".jsonl", "_train.jsonl")
        eval_output_file = input_file.replace(".jsonl", "_eval.jsonl")
        df_train.to_json(train_output_file, orient="records", lines=True)
        df_eval.to_json(eval_output_file, orient="records", lines=True)
    except Exception as e:
        print(e)
        return "", ""

# Main function to run the data processing pipeline
def main():
    # Specify the topic you want to search for
    topic = "financial market"

    # Fetch data
    articles = fetch_financial_news(NEWS_API_KEY, query=topic)

    # Save the data to a CSV file
    save_news_to_csv(articles, topic)

    # Process the CSV file to generate a JSONL file
    process_csv_to_jsonl("financial_market_news.csv")

    # Separate the JSONL file into training and evaluation sets
    separate_jsonl_train_eval("financial_market_news.jsonl")

# Entry point of the script
if __name__ == "__main__":
    main()

```
{% endraw %}
{% raw %}`

This Python script automates the process of collecting financial news data, formatting it for fine-tuning a MistralAI language model, and preparing it into training and evaluation datasets:

- **`fetch_financial_news`**: This function fetches financial news articles from NewsAPI based on a specified topic.
- **`save_news_to_csv`**: This function saves the fetched news data into a CSV file.
- **`process_csv_to_jsonl`**: This function converts the CSV data into a JSONL format, generating chat-based interactions using MistralAI.
- **`separate_jsonl_train_eval`**: This function splits the JSONL data into training and evaluation datasets.

The main function orchestrates the entire workflow:

- Fetching news data.
- Saving the data to a CSV file.
- Converting the CSV data to JSONL format.
- Splitting the JSONL data into training and evaluation datasets.

This script will generate the following dataset files, which will be used to train the model (you will run this script later on the remote machine):

- **`financial_market_news_train.jsonl`**: Contains the training data, in this case, 90 records of questions and answers related to news source data.
- **`financial_market_news_eval.jsonl`**: Contains the evaluation data, in this case, 10 records of questions and answers related to news source data.

## Configure the Training Script

Before we deploy to the Koyeb CPU to validate the dataset and train the model, you can start preparing the training configuration file. This file, which is a YAML file, will include all the necessary settings for the training process, as mentioned earlier.

So, go ahead and create a `7B.yaml` file:

```yaml
# data
data:
  instruct_data: '/mistral-finetune/financial_market_news_train.jsonl' # Fill
  data: '' # Optionally fill with pretraining data
  eval_instruct_data: '/mistral-finetune/financial_market_news_eval.jsonl' # Optionally fill

# model
model_id_or_path: '/mistral-finetune/mistral_models/' # Change to downloaded path
lora:
  rank: 64

# optim
seq_len: 32768
batch_size: 1
max_steps: 300
optim:
  lr: 6.e-5
  weight_decay: 0.1
  pct_start: 0.05

# other
seed: 0
log_freq: 1
eval_freq: 100
no_eval: False
ckpt_freq: 100

save_adapters: True # save only trained LoRA adapters. Set to `False` to merge LoRA adapter into the base model and save full fine-tuned model

run_dir: '/mistral-finetune/chat_test' # Fill
``{% endraw %}{% raw %}`

This is the important information that you need to fill in:

- **`instruct_data`**: This is the path to the training dataset. This dataset will be generated when you run the **`dataset_processing.py`** script on the remote machine.
- **`eval_instruct_data`**: This is the path to the evaluation dataset. This dataset will also be generated when you run the **`dataset_processing.py`** script on the remote machine.
- **`model_id_or_path`**: This is the identifier or path of the model you will be training. You will download this model later on the remote machine.
- **`batch_size`**: You can adjust this if needed, but a batch size of 1 will work well for this case.
- **`max_steps`**: This is the number of steps to train the model with. The default of 300 steps provides a good balance between speed and training capabilities. You can reduce to 100 steps for faster processing at a cost of less accuracy.
- **`run_dir`**: This is the directory where the trained model will be saved.

After deployment, you will need to download the model to train, execute the dataset script, and then train the model. These settings are prepared for the commands you will execute later on.

## Deploying to Koyeb GPU

To deploy the fine-tuning process to Koyeb, you will need to create a Dockerfile that sets up the environment for training the model, a repository to store the code, and finally deploy the app to Koyeb via git and built using the Dockerfile.

### Create a Dockerfile

We'll start by preparing a Dockerfile to ensure we have all the necessary dependencies installed, especially for GPU support. Create a `Dockerfile` with the following contents:

```docker
# Use the official Python base image
FROM python:3.11

# Clone the repository
RUN git clone https://github.com/mistralai/mistral-finetune.git

# Set the working directory
WORKDIR /mistral-finetune

# Update pip, install torch and other dependencies
RUN pip install --upgrade pip && pip install -r requirements.txt

# Copy the 7B.yaml file
COPY 7B.yaml /mistral-finetune/example/7B.yaml

# Script to prepare the training data
COPY dataset_processing.py /mistral-finetune/dataset_processing.py
```

This Dockerfile is designed to set up an environment for fine-tuning the MistralAI language model. It automates the process of cloning the necessary repository, installing dependencies, and copying both the training configuration and the training dataset script.

### Create the repository

The final step is to create a new repository on GitHub to store the project files.

Once you're ready, run the following commands in your terminal to commit and push your code to the repository:

```docker
echo "# MistralFineTuning" >> README.md
git init
git add .
git commit -m "First Commit"
git branch -M main
git remote add origin [Your GitHub repository URL]
git push -u origin main
``{% endraw %}{% raw %}`

You should now have all your local code in your remote repository. Now it is time to deploy the Dockerfile.

### Deploy to Koyeb

In the [Koyeb control panel](https://app.koyeb.com/), while on the **Overview** tab, initiate the app creation and deployment process by clicking **Create App.** You can select a Worker application.

On the App deployment page:

1. Select **GitHub** as your deployment method.
2. Choose the repository where your code resides. For example, `MistralFineTuning`.
3. Select the GPU you wish to use, for example, `A100`. The training might work on other GPUs, but for performance and training accuracy, this is the recommended.
4. In the **Builder** section, choose **Dockerfile**.
5. In the **Service name** section, choose an appropriate name.
6. Finally, click **Deploy**.

## Running the Fine-Tuning process

Once the deployment is complete, you can start preparing and running the fine-tuning of the model.

The Dockerfile deployment has set up the base system needed to train the model, but it didn’t download a model, so that will be one of the first steps.

Since the next commands need interaction with the remote machine, you'll use the Koyeb CLI to access the remote machine through the terminal.

First, make sure you have the Koyeb CLI installed. You can find the installation instructions [**here**](https://www.koyeb.com/docs/build-and-deploy/cli/installation). Then, generate an API Token, which you can do [**here**](https://app.koyeb.com/user/settings/api/).

Now you are ready to log in with the Koyeb CLI:

```bash
koyeb login
```

First, input your API token key when asked for it.

To see a list of running instances, use the following command:

```bash
koyeb instances list
```

Note the instance ID you want to connect to. Then, create a remote terminal session to the remote machine:

```bash
koyeb instances exec <instance_id> /bin/bash
```

You now have an active remote session to the remote machine. All commands executed from now on will be on the remote machine.

As mentioned, the first step is to download the model to train, in this case the Mistral 7B Instruct:

```bash
mkdir mistral_models
wget https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-Instruct-v0.3.tar && tar -xf mistral-7B-Instruct-v0.3.tar -C mistral_models
```

It might take a couple of minutes for the model to be downloaded and extracted.

Next, to ensure proper compatibility, make sure that the Numpy package installed is at version 1.26.4:

```bash
pip install numpy==1.26.4
```

Now you can install the necessary libraries for executing the dataset script:

```bash
pip install requests pandas mistralai python-decouple
```

You can then copy the necessary information for the **`.env`** file:

```bash
echo "NEWS_API_KEY=<YOUR_NEWS_API_KEY>" > .env
echo "MISTRAL_API_KEY=<YOUR_MISTRAL_API_KEY>" >> .env

```

Make sure to replace the values with your own API keys.

And then you can execute the **`dataset_processing.py`** script:

```bash
python3 dataset_processing.py
```

It might take a couple of minutes to prepare the dataset. After it finishes, you should have two JSONL files corresponding to the training and evaluation datasets.

You can now validate those datasets with:

```bash
python3 -m utils.validate_data --train_yaml example/7B.yaml
```

You should get an estimate on the ETA for the training and there should not be any validation errors. If there are errors, you can fix them with:

```bash
python3 -m utils.reformat_data cryptocurrency_market_news_train.jsonl
python3 -m utils.reformat_data cryptocurrency_market_news_eval.jsonl
```

Validate the dataset again (if needed) and now there should be no errors:

```bash
python3 -m utils.validate_data --train_yaml example/7B.yaml
```

Everything is now ready to train the model, which you can do with:

```bash
CUDA_VISIBLE_DEVICES=0 python3 -m torch.distributed.run --nproc-per-node 1 --master_port 65123 -m train example/7B.yaml
```

The **`CUDA_VISIBLE_DEVICES=0`** is necessary to make sure the training script recognizes the GPU on the remote machine.

This process will take several minutes, possibly even hours. It will show an estimate of the remaining processing time.

After it is finished, you will be able to evaluate the trained model against the standard model, which we will see how to do in the next section.

## Evaluating the Fine-Tuned Model

To evaluate the fine-tuned model, we first need to establish a baseline with the default model.

First, you need to install the necessary package on the remote machine:

```bash
pip install mistral_inference
```

Now you can test the default model by running:

```bash
mistral-chat /mistral-finetune/mistral_models/ --max_tokens 1024 --temperature 0.7 --instruct
```

It will ask you for a prompt. Let’s try this one:

```bash
Prompt: how is x adverstising doing?

To provide an accurate answer, I would need specific details about X advertising. However, I can share some general insights about advertising in today's digital landscape:

1. Digital Advertising: With the rise of the internet and mobile devices, digital advertising has become a significant part of the marketing mix. Platforms like Google, Facebook, Instagram, and YouTube offer targeted advertising options based on user demographics, interests, and online behavior.

2. Social Media Advertising: Social media platforms are popular advertising channels due to their vast user base and the ability to target ads based on user profiles. Brands use social media ads to increase brand awareness, drive traffic to their websites, and generate leads.

3. Influencer Marketing: Influencer marketing has grown in popularity, as brands partner with social media influencers to promote their products or services. This strategy allows brands to tap into the influencer's audience and leverage their credibility.

4. Traditional Advertising: While digital advertising is on the rise, traditional advertising methods like television, radio, print, and outdoor advertising still have a place in the marketing mix, especially for reaching broader audiences or those who are less tech-savvy.

5. Measuring Success: Advertising success is often measured by metrics such as reach, impressions, click-through rates, conversion rates, and return on investment (ROI). Brands use these metrics to evaluate the effectiveness of their advertising campaigns and make data-driven decisions.

Without more specific information about X advertising, it's difficult to say how they are doing. It's essential for brands to regularly analyze their advertising performance and adjust their strategies as needed to maximize their return on investment.

```

As you can see, the default model gave a very generic answer.

Now let’s run the fine-tuned model with:

```bash
mistral-chat /mistral-finetune/mistral_models/ --max_tokens 1024 --temperature 0.7 --instruct --lora_path /mistral-finetune/chat_test/checkpoints/checkpoint_000300/consolidated/lora.safetensors
```

And we use the same prompt:

```bash
Prompt: how is x adverstising doing?

X advertising is currently seeing a slight increase in performance.
```

As you can see, the fine-tuned model gave a much more accurate and precise answer.

### Impact on Domain Knowledge

Fine-tuning MistralAI on financial data significantly improves the model's ability to understand and operate within the financial domain. This process transforms the model into a specialized tool that has a deep understanding of the financial domain.

Here we have just exposed the model to a subset of recent news, but by exposing the model to more domain-specific data, such as financial reports, market analysis, and regulatory documents, it learns the precise meanings and nuances of financial terminology.

Fine-tuning also helps the model stay current with ongoing trends in the financial industry. This includes understanding the implications of market movements, economic indicators, and geopolitical events on financial markets.

This enhanced understanding and specialization enable the model to perform a wide range of finance-related tasks with greater accuracy, relevance, and compliance. This makes it an invaluable asset for financial professionals and organizations, helping them to make more informed decisions and improve their overall performance in the financial domain.

## Conclusion

You've just completed this tutorial on fine-tuning MistralAI on Koyeb Serverless GPUs.

You can check out the example repository for this tutorial on Koyeb's GitHub for [fine-tuning MistralAI](https://github.com/koyeb/example-finetune-mistralai) with serverless GPUs.

While this guide focused on fine-tuning MistralAI for finance, the approach and techniques covered here are the same across various domains. Whether you're working with healthcare data, legal documents, technical manuals, or customer service interactions, fine-tuning can significantly improve the relevance and accuracy of AI models.

Have fun experimenting with your own datasets and seeing how fine-tuning can add value and improve performance in your specific area of interest!

Fine-Tune Llama 3.1 8B using QLORA

alisdairbr — Wed, 02 Oct 2024 08:33:53 +0000

Large Language Models (LLMs) are fantastic tools for getting quick answers on programming questions. However, their knowledge is not always up to date and they may not know about your favourite framework or library. Maybe it's software that only your company uses, a new framework that's just come out or a new version of a popular library.

In this guide, we'll walk you through how to fine-tune an LLM on your favourite project's documentation. This will enable the model to answer questions with (hopefully) correct, and up-to-date information. We'll be using Llama 3.1 8B, Meta's latest open-source model and teach it about Apple's new deep learning framework: MLX.

We will first generate a custom LLM training dataset from Apple's documentation and publish it on the HuggingFace Hub. Then, we'll fine-tune Llama 3.1 8B using QLORA, a training method which significantly reduces GPU memory usage and training time. Finally, we'll deploy the model on Koyeb's serverless GPUs, enabling you to get answers to your questions in real-time.

Quick disclaimer: This guide is intended as an introductory overview. Fine-tuning a language model involves careful consideration of data distribution, hyperparameters, and continual pre-training. For production-level models, a more rigorous approach is required.

Requirements

To successfully follow this tutorial, you will need the following:

Python 3. or later.
An OpenAI API key.
A HuggingFace access token with write permissions and access to Llama 3.1 8B Instruct.
A Weights & Biases access token (Optional).

Steps

Configure the local environment.
Build the Apple MLX documentation from source (Optional).
Generate the training dataset with Python and the OpenAI API.
Fine-tune the model using Jupyter Notebook on Koyeb.
Deploy and use the fine-tuned model.

Configure the local environment

First, we'll clone the repository for this project and create a Python virtual environment and install the required dependencies.

# Clone the repository
git clone https://github.com/koyeb/finetune-llama-on-koyeb.git
cd finetune-llama-on-koyeb

# Create a virtual environment
python3 -m venv venv

# Active the virtual environment (Windows)
.\venv\Scripts\activate.bat
#  Active the virtual environment (Linux & macOS)
source ./venv/bin/activate

Now let's install the dependencies required for this project.

pip install datasets==2.16.1 openai==1.42.0 tqdm

The datasets library is used to push our dataset to the HuggingFace Hub and the openai library lets us interact with the OpenAI API.

Next, we'll login to the HuggingFace Hub.

huggingface-cli login

Follow the instructions in the terminal and paste your access token when prompted.

Build the Apple MLX documentation from source (Optional)

The repository for this tutorial already contains the Apple MLX documentation in text format. However, if you want to build the documentation from source, you can follow the instructions below. Otherwise, you can skip to the next step.

You'll need to install doxygen to build the Apple MLX documentation from source.

# Install doxygen (macOS)
brew install doxygen
# Install doxygen (Linux)
sudo apt-get install doxygen

Now, you can clone the MLX repository and build the documentation using Doxygen.

# Clone the MLX repository
rm -r mlx ; git clone git@github.com:ml-explore/mlx.git

# Install the required dependencies and build the documentation in text format
cd mlx/docs
pip install -r mlx/requirements.txt
doxygen && make text

# Move back to the project directory
cd ../..

If everything went well, the mlx/docs/build/text directory should now contain the documentation in text format. If you encounter any issues, you can fallback to using the pre-built documentation from the repository.

Generate the training dataset

To generate the training datase, we'll use the OpenAI API. The script generate_dataset.py in the repository does this for us. There's a lot going on in this script, so let's break it down:

At the top of the file, we define the prompts used to generate questions and answers.
After parsing the command-line arguments, we read all the documentation files in an array.
For each chunk of documentation, we generate N questions using the chat endpoint of the OpenAI API.
- We use OpenAI's structured output feature to ensure the model generates a list of questions. This is done by specifying a JSON schema in the response_format parameter.
For each question, we generate an answer using the same chat endpoint.
Finally, we write the question-answer pairs to a JSONL file and push it to the HuggingFace Hub.

We can now run the script, specifying the input directory, output file location, the OpenAI model to use and the HuggingFace repository to push the dataset to. This should be the name of your organization (or HuggingFace account) and the name of the dataset (for example koyeb/Apple-MLX-QA).

export OPENAI_API_KEY='your-openai-api-key'
python generate_dataset.py --input mlx/docs/build/text --output apple-mlx-qa.jsonl --model gpt-4o --repo koyeb/Apple-MLX-QA

This should use less than 10$ in OpenAI credits and take an hour or so. If you don't have an OpenAI API key or don't want to use it, you can skip to the next step and use the dataset we published on HuggingFace.

Fine-tune the model using Jupyter Notebook on Koyeb

Now that we have our fine-tuning dataset, we can proceed with fine-tuning. The next step involves deploying a Jupyter Notebook server on a Koyeb GPU instance. To do this, you can visit the One-Click App page for the Jupyter Notebook on Koyeb and follow the instructions on the page.

Once your service is started, visit the URL and connect to the Jupyter server using the password you set during the deployment process. Once you're logged into Jupyter, import the notebook.ipynb file by clicking on the "Upload" button in the Jupyter interface as shown below:

The rest of the instructions for this step are in the notebook. Once you're done, you can come back here to deploy and use the fine-tuned model on Koyeb.

Deploy and use the fine-tuned model on Koyeb

This section teaches you how to use the model in Python code and how to deploy it for production use on Koyeb's serverless GPUs.

You can use your LORA adapter in Python code using torch, transformers, and peft. Here's an example:

pip install torch transformers peft

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the base model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct", torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

# Load the fine-tuned model using LORA
model = PeftModel.from_pretrained(
    model,
    "koyeb/Meta-Llama-3.1-8B-Instruct-Apple-MLX-Adapter",
).to("cuda")

# Define input using a chat template with a system prompt and user query
ids = tokenizer.apply_chat_template(
    [
        {
            "role": "system",
            "content": "You are a helpful AI coding assistant with expert knowledge of Apple's latest machine learning framework: MLX. You can help answer questions about MLX, provide code snippets, and help debug code.",
        },
        {
            "role": "user",
            "content": "How do you transpose a matrix in MLX?",
        },
    ],
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to("cuda")

# Generate and print the response
print(
    tokenizer.decode(
        model.generate(input_ids=ids, max_new_tokens=256, temperature=0.5).tolist()[0][
            len(ids) :
        ]
    )
)

For production, you can deploy your fine-tune on Koyeb's serverless GPUs using vLLM with One-Click Apps.

Visit the One-Click App page for vLLM and click the "Deploy" button.
Override the command args and specify the HuggingFace repository for your merged model: ["--model", "YOUR-ORG/Meta-LLaMa-3.1-8B-Instruct-Apple-MLX"]
Set your HuggingFace access token in the HF_TOKEN environment variable. Optionally, set VLLM_DO_NOT_TRACK to 1 to disable telemetry.

Once deployed, you can interact with the model using the OpenAI API format. Here's an example using curl:

curl https://YOUR-SERVICE-URL.koyeb.app/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "messages": [
        {"role": "system", "content": "You are a helpful AI coding assistant with expert knowledge of Apple latest machine learning framework: MLX. You can help answer questions about MLX, provide code snippets, and help debug code."}
        {"role": "user", "content": "How do you transpose a matrix in MLX?"}
    ],
     "temperature": 0.3
   }'

Conclusion

Congratulations, you've successfully fine-tuned Llama 3.1 8B using QLORA!

Remember, fine-tuning is an iterative process. Feel free to experiment with different hyperparameters and training methods to get the best results. You can also work on increasing the size or improving the quality of your training dataset using additional data sources or data augmentation techniques.

Volumes: High IOPS and Low Latency NVMe SSDs Public Preview

alisdairbr — Mon, 23 Sep 2024 11:00:00 +0000

It’s the final day of Koyeb's Launch Week Round 2, and today we’re announcing the public preview of Volumes!

Volumes on Koyeb are blazing-fast NVMe SSD you can use to persist data across deployments.
After announcing Volumes in technical preview a few months ago, we are now opening the preview to all users on the Starter plan!

Offering high throughput and low latency, Volumes open the door to a wide range of new workloads and use cases to handle the state of your applications.
Whether you’re looking to run:

Clustered databases
Object storage
Queue systems
Storing model checkpoints and weights for AI workloads
Or anything else that requires data persistence

Get seamless data persistence by deploying any container image from DockerHub - such as Minio, Neo4j, ClickHouse, or Meilisearch - on Koyeb. With just a few clicks or simple CLI commands, you can leverage volumes and persist your data across deployments.

Right now, Volumes are available in Washington DC and Frankfurt for all Standard CPU instance types. Over the next few months, we will progressively add Volumes to more regions, if you are interested in a specific region, just let us know!

During the public preview, Volumes are completely free, so dive in and start adding persistence to your
services!
{' '}

Getting Started with Volumes

To help you get started, here are a few examples to deploy some popular stateful applications including Minio, ClickHouse, MongoDB, and Neo4j using Volumes to handle data persistence using Koyeb CLI.

Minio

Minio is a high-performance distributed object storage server with an S3 compatible API.
In the following example, we will:

Create a 10GB Volume in the Washington DC region
Deploy Minio using the official Minio Docker image quay.io/minio/minio
Attach the minio-data Volume to the service and mount our volume into the /data directory of the container

Create a Volume

koyeb volume create minio-data --size 10 --region was

Deploy Minio

koyeb app init minio \
  --docker quay.io/minio/minio \
  --docker-command server \
  --docker-args /data \
  --docker-args --console-address \
  --docker-args :9001 \
  --region was \
  --volumes minio-data:/data \
  --env MINIO_BROWSER_REDIRECT_URL=https://{{ KOYEB_PUBLIC_DOMAIN }}/console/ \
  --port 9000:http \
  --port 9001:http \
  --route /:9000 \
  --route /console:9001

ClickHouse

ClickHouse is a fast and resource efficient open-source column-oriented database management system.
In the following example, we will:

Create a 10GB Volume in the Washington DC region
Deploy ClickHouse using the official ClickHouse Docker image clickhouse/clickhouse-server
Attach the clickhouse-data Volume to the service and mount our volume into the /var/lib/clickhouse directory of the container

Create a Volume

koyeb volume create clickhouse-data --size 10 --region was

Deploy ClickHouse

koyeb app init clickhouse-server \
  --docker clickhouse/clickhouse-server \
  --region was \
  --volumes clickhouse-data:/var/lib/clickhouse \
  --env CLICKHOUSE_USER=main \
  --env CLICKHOUSE_PASSOWRd={{ secrets.clickhouse_password }} \
  --port 8123:http \
  --port 9000:tcp \
  --route /:8123

MongoDB

MongoDB is a general-purpose, document-based, distributed database built for modern applications.
In the following example, we will:

Create a 10GB Volume in the Washington DC region
Deploy MongoDB using the official MongoDB Docker image mongo
Attach the mongo-data Volume to the service and mount our volume into the /data/db directory of the container

Create a Volume

koyeb volume create mongo-data --size 10 --region fra

The command above creates a 10GB Volume in the Frankfurt region.

Deploy MongoDB

koyeb app init mongo \
  --docker mongo \
  --region was \
  --volumes mongo-data:/data/db \
  --env MONGO_INITDB_ROOT_USERNAME=main \
  --env MONGO_INITDB_ROOT_PASSWORD={{ secrets.mongodb_password }} \
  --port 27017:tcp

Neo4j

Neo4j is a highly scalable, robust native graph database.
In the following example, we will:

Created a 10GB Volume in the Washington DC region
Deploy Neo4j using the official Neo4j Docker image neo4j
Attach the neo4j-data Volume to the service and mount our volume into the /data directory of the container

Create a Volume

koyeb volume create neo4j-data --size 10 --region fra

The command above creates a 10GB Volume in the Frankfurt region.

Deploy Neo4j

koyeb app init neo4j \
  --docker neo4j \
  --region was \
  --volumes neo4j-data:/data \
  --env NEO4J_AUTH={{ secrets.neo4j_password }} \
  --port 7474:http \
  --port 7687:tcp \
  --route /:7474

That's it! For each of the example above, in less than 60 seconds, you will have a Minio, ClickHouse, Mongo, or Neo4j service running on Koyeb using a Volume to handle peristence.
You can then start using from your other services using our built-in service mesh for secure service to service communication.

What’s Next?

Volumes is a foundational primitive to build and run stateful applications on Koyeb. We're currently working on adding more features around data mobility to let you attach and detach Volumes from services and support snapshots to create a point-time copy of your Volumes.

We're looking forward to your feedback to help us shape the product, and stay tuned as we roll out even more exciting updates during Launch Week!

AWS Regions Public Preview: Deploy on AWS in Minutes

alisdairbr — Fri, 20 Sep 2024 15:40:55 +0000

Today, we are thrilled to announce the public preview of AWS Regions on Koyeb!

During last launch week, we introduced the private preview of AWS Regions. We announced our first region, the famous, us-east-1 (N. Virginia) region, and started onboarding our first users.

What’s new with this public preview? In addition to the us-east-1 location now being available to everyone on the platform, you can now deploy to this region via the control panel and the CLI! Get ready for faster releases and an even better deployment experience.

Want to see another AWS region on the platform? Let us know what you need! We will be adding more regions based on demand.

Why AWS regions on Koyeb?

We repeatedly heard from some of our business users that they needed an easier way to deploy on their AWS infrastructure. A solution where they could benefit from the seamless Koyeb deployment experience while running on their existing AWS infrastructure.

This was the case for Sush, a mobile game with several million users. Here's what Nathan Appere, CTO at Sush, had to say about the new AWS region on Koyeb:

Scaling Sush to several millions of users while maintaining a lean infrastructure has been a real challenge. We started on Heroku, but we lacked flexibility and the costs were unsustainable at scale. As we’re using AWS RDS with high bandwidth, we had to stay on AWS to control our egress costs. Switching to the Koyeb AWS region is a game changer: we have a seamless deployment experience, improved efficiency, and cut costs by 50%!

Get the power of 8+ AWS products without the hassle of manual configuration

AWS offers a wide ecosystem of products and services. Setting up and managing infrastructure on AWS quickly becomes complex and a full time job, even for experienced teams.

EC2 machines
Databases
Gateways
Load balancers
VPCs
Network policies
Static IPs
DNS
Network policies
Security policies
Build pipelines
Continuous deployment pipelines

You name it, AWS has it. And you need to manage it.

This where Koyeb comes in. Koyeb manages these layers, removes complexity, and provides you a seamless deployment experience out-of-the-box.

Deploy on AWS Regions with Koyeb in Minutes

As of today, everyone can deploy in our inaugural AWS region: us-east-1! Deploying in this region is as simple as deploying in our Koyeb regions.

Here’s a quick rundown of how to deploy in the new AWS region using the control panel and the Koyeb CLI.

Via the Control Panel

For new deployments, simply click on the AWS tab above the map of core locations to select the new AWS region in us-east-1. From there, you can select the Instance size with the resources your workload needs.

Via the CLI

As a part of this public preview, you can now deploy in aws-us-east-1 using the Koyeb CLI.

Here’s an example for how to deploy our demo application in AWS us-east-1:

koyeb app init demo-app --docker docker.io/koyeb/demo --ports 3000:http --routes /:3000 --regions aws-us-east-1 --instance-type small

Pricing

AWS regions are more expensive than Koyeb optimized regions. Here are the prices of our Standard Instance types on AWS:

Instance type	vCPU	RAM	Disk	Price
`nano`	0.25	256MB	2.5GB SSD	$5.36/month($0.000002/second)
`micro`	0.5	512MB	5GB SSD	$10.71/month($0.000004/second)
`small`	1	1GB	10GB SSD	$21.43/month($0.000008/second)
`medium`	2	2GB	20GB SSD	$42.85/month($0.000016/second)
`large`	4	4GB	40GB SSD	$85.71/month($0.000032/second)
`xlarge`	8	8GB	80GB SSD	$171.42/month($0.000064/second)
`2xlarge`	16	16GB	160GB SSD	$342.84/month($0.000128/second)
`3xlarge`	24	32GB	240GB SSD	$685.67/month($0.000256/second)
`4xlarge`	32	64GB	320GB SSD	$1371.34/month($0.000512/second)
`5xlarge`	40	128GB	400GB SSD	$2742.68/month($0.001024/second)

Keep in mind: As bandwidth is more expensive on AWS, we will charge you the real cost.

Reduce Bandwidth Cost & Maximize Performance

In this day and age of cloud, network is more expensive than compute.

content="Switching to the Koyeb AWS region is a game changer: we have a seamless deployment experience, improved efficiency, and cut costs by 50%!"
author="Nathan Appere"
position="CTO, Sush"
/>

Running on AWS regions on Koyeb eliminates data transfer and bandwidth costs between your Koyeb services and AWS products, giving you the ability to scale faster and more cost-effectively.
All while maximizing performance.

Prototype with Koyeb, Deploy to Production on AWS

Switching between Koyeb and AWS regions is as simple as one click.

With Koyeb, you can deploy your applications and services across our 255+ edge locations and 6 core locations worldwide, so your applications run close to your users and provide them blazing-fast experiences.

If you want to switch your deployment from one of our regions to an AWS region, or vice versa, it's just a matter of changing the region in your deployment configuration. In other words, one click away.

Seamless deployment experience on top of high-performance infrastructure

Whether its on AWS or our own infrastructure, we run the same secured virtualization technology on top of bare metal machines. Having control of the core technology allows us to offer you greater productivity and the best performance.

With AWS Regions on Koyeb, you can seamlessly integrate the Koyeb deployment experience with AWS. Run on top of your existing infrastructure and leverage the entire AWS products portfolio.

What’s next: Join over 100k developers and deploy global applications across 3 continents 🚀

Need a feature on Koyeb? Want to see what we'll be releasing on the platform next? Check out our public roadmap and feature request platform.

If you need some ideas about what to deploy first, check out our tutorials section, our one-click apps catalog, and our collection of deploy guides in our documentation.

To get you started, we provide a Free Tier that lets you deploy your first service and managed database for free. If you want to know more about why we offer a free tier and how we sustain it, read our dedicated blog post about sustaining a free tier.

We are looking forward to seeing what you’ll deploy across our global locations, including AWS us-east-1! 🚀 🌐

Paris and Tokyo Regions in GA

alisdairbr — Fri, 20 Sep 2024 15:38:56 +0000

Today is Day 3 of Launch Week, and we are excited to announce not just one, but two new regions are generally available to deploy your low-latency AI workloads, full stack applications, APIs, and databases globally!

Join us in welcoming Paris and Tokyo to the Koyeb platform and to a growing list of global locations where you can deploy your applications! 🎉

As of today GA locations now include: Paris 🇫🇷, Tokyo 🇯🇵, Washington, D.C. 🇺🇸, Singapore 🇸🇬, and Frankfurt. 🇩🇪 🇪🇺

Everything you need to deploy high performance serverless apps is now available in Paris and Tokyo:

Standard Instances with blazing fast CPUs to host your APIs, inference endpoints, or Workers for async processing
Deploy any one of over 40 Apps in One-Click or your own GitHub repository and Docker containers
Our 70% faster deployment speed and high-performance private network that we announced yesterday.

Additionally, the Paris and Tokyo regions benefit from our zero infrastructure management experience with all the platform’s features: continuous deployment, autoscaling, built-in edge network, service discovery for seamless service-to-service communication, real-time logs and metrics, and more.

Deploy in Paris and Tokyo in Seconds 秒

Whether you prefer a user-friendly control panel or prefer working from your terminal, you can deploy through our dashboard or CLI within seconds.

Our platform supports deployments with:

Git: Seamlessly integrate with GitHub to push code directly from your repositories to production. We take care of the build process, using either native buildpacks or a Dockerfile that you provide.
Docker images: Deploy container images from any public or private registry. This is ideal if you already have a CI pipeline generating Docker images. In both cases, your workloads are run in isolated Firecracker microVMs on our bare metal servers, located in your chosen regions.
Local directory: Using the Koyeb CLI, you can build and deploy your application directly from your project's directory. Unlike deploying with GitHub or from a container registry, deploying from a project directory does not require any intermediary services.

For today’s demo, we’ll deploy a sample Go application from a public Git repository.

Via the CLI

To deploy our Go example in Paris and Tokyo, run the following command with the CLI:

koyeb app init fast \
  --git github.com/koyeb/go \
  --git-branch main \
  --regions par,tyo

Once the deployment has finished and your application is live, run the following command to obtain the public domain, which you can then use to access your application.

koyeb app get fast
ID          NAME    STATUS      DOMAINS                                 CREATED AT
b77006e8    fast    STARTING    ["fast-lets-go-2455e588.koyeb.app"] 18 Sep 24 08:58 UTC

Via the control panel

To deploy any project in Paris and Tokyo using the freshly revamped control panel, you will need to follow 3 steps:

👨‍💻 Select your deployment method: GitHub or Docker
⬇️ Import your project
✅ Configure your service

After you've imported your project, you can select the instance type and the regions you want to deploy to.

Here's the service summary page you'll see before deploying the example Golang application using the control panel:

On this page, you can configure:

Builder: Koyeb can use either native buildpacks or a Dockerfile to build your application
Environment variables: You can also set Secrets to keep sensitive information safe
Instance type: GPU, Standard, and Eco Instances
Regions: Tokyo, Paris, San Francisco, Washington, D.C., Frankfurt, and Signapore
Scaling: Fixed or autoscaling
Ports: The ports on which your service listens
Health checks: By default, Koyeb automatically performs TCP health checks on your Service's exposed ports to ensure maximum availability

You can see the estimated cost of your service before deploying it. Once you’re ready, hit Deploy to launch your application.

Why Paris and Tokyo?

Proximity is essential when it comes to delivering optimal performance to your end users in Europe and APAC.
Paris and Tokyo were two highly demanded regions for the platform.

Hosting your services closer to your audience improves connectivity and reduces latency, ensuring a faster and smoother experience.

As a part of our mission to provide everyone using your apps and services the best possible experience, we needed to add a core location closer to them.
We're so excited about the thriving tech ecosystems in these regions and look forward to seeing what you deploy there!

Pricing

Standard Instance in Paris and Tokyo are available for everyone at the same price than other regions.

Our pricing model is simple and transparent: you only pay resources used per second, allowing you to scale based on your needs.
You can deploy our GPU, Standard, and Eco Instances on-demand:

GPU Instances are perfect for running AI workloads and inference. Explore our GPU Instances.
Standard Instances are ideal for production workloads and resource-intensive applications. Explore our Standard Instances.
Eco Instances are designed for cost-effective workloads that can tolerate some latency. Explore our Eco Instances.

We also offer managed PostgreSQL databases, so you can run your databases in the same place as your services. Explore our Postgres database details and pricing.

For more details, check our pricing page or contact us.

Built-in to all Koyeb Instances to Power Your Apps

We're bringing the world’s best serverless features to Paris and Tokyo:

🚀 Continuous delivery with Git streamlines pushing your code to production
🧑‍💻 Deploy code from GitHub or pre-built Docker images from your container registry
🐳 Automatic build process with native buildpacks or via your Dockerfile
🗺️ Global deployments in any or all 6 regions
📈 Autoscaling automatically adapts the resources allocated to your services to adapt to your production traffic while giving you full control of the maximum costs you will be paying
🌐 Built-in edge network ensures requests to your service are always routed to the closest edge location and if need be to the closest core location. Global Edge Network provides native load balancing, TLS encryption, and caching
🤝 Support for gRPC, HTTP/2 and websockets
🔎 Built-in service mesh and discovery for seamless service-to-service communication
📊 Get insights into your service with Koyeb Metrics
📃 Log exporter to forward logs to external log management
🧑‍⚕️ Configure custom health checks
🤫 Secrets Management
🧑‍💻 Deploy and manage your services with the Koyeb CLI, Terraform, Pulumi

What's Next? Join over 100k developers and deploy global applications across 3 continents 🚀

We are working on adding more regions to our platform. If you have a specific region in mind, please let us know by upvoting it on our feedback board.

If you need some ideas about what to deploy first, check out our tutorials section, our one-click apps catalog, and our collection of deploy guides in our documentation.

We are looking forward to seeing what you’ll deploy in Paris and Tokyo!

70% Faster Deployments and High-Performance Private Network

alisdairbr — Tue, 17 Sep 2024 13:30:57 +0000

It’s Day 2 of our Launch Week, and we’re excited to unveil our new networking stack!

If you're following us, you know that we're obsessed with performance: we want fast deployments and a speedy network once your apps are live.

Long story short: while working on optimizing deployments speed, we hit limitations with our old stack, we decided to revamp everything and built a new networking stack for the platform.

When you deploy on the platform, you get advanced capabilities out-of-the-box including automatic load-balancing, fully encrypted private networking, built-in observability, auto-healing, and automatic service discovery to name a few. All these features are tied to the networking stack.

To build this new network layer, we've replaced our previous setup, a forked Kuma Mesh, with a custom-built stack on top of Envoy and Cilium. We had to completely rewrite our GLB component (Global Load Balancer), move to a custom service mesh built on CoreDNS and backed by Consul DNS, and swap out sidecars for a power combination of Cilium and eBPF.

This brings three major improvements:

70% faster deployments: A new service now takes between a couple of seconds and 90 seconds to go live. Previously, it could take up to 5 minutes before. (Don't quote us on this if you have a 200GB model in your image (: )
High-Performance Private Network: The built-in private network, or VPC, now provides up to 10 Gb/s of bandwidth.
Improved deployment reliability: Network configuration propagation errors are gone!

This is now live in all regions and all new deployments happen on this new stack!

From Envoy sidecars with Kuma to eBPF processing with Cilium and... Envoy

When we started building Koyeb, we had to pick the right tools and technology to build and scale our platform in its early days.

For the network stack, we've always aimed to provide a built-in service mesh with the following key features:

Multi-tenancy: network isolation for each user
Service discovery: a service is addressable via an internal domain name
Observability: Network calls between services may be traced and metrics can be collected
Built-in load-balancing: to easily scale and autoscale horizontally services
Resiliency: Automatic detection of instances not answering to health checks
Zero Trust Network: mTLS in between each Instance

We picked Kuma, an open-source service mesh built on top of Envoy, to build our multi-region service mesh.
Kuma helped us go from nothing to scaling 10s of thousands of applications running on the platform.

Then, we started to hit limitations as we scaled:

Deployment speed: The relationship between networking and deployment speed is network convergence time. We want faster deployment times so you can deploy a critical patch in seconds and iterate quickly. Our machinery used to spend a lot of time computing network configurations and delivering them to a numerous amount of proxies, all over the world. The green line on the graph below is the propagation time before (with Kuma) vs our new network stack in yellow.
Bandwidth and latency: Bandwidth and latency are critical for network-intensive apps like distributed databases and real-time applications. Kuma, like most service meshes, uses a sidecar model, which functionally adds latency to all requests flowing in and out of Koyeb instances. As data volumes are coming to the platform, you'll be able to deploy distributed databases where high-throughput networking matters

To ensure we can continue delivering the best possible experience, we decided to rethink our approach.

Long story short, we decided to drop Kuma and Envoy for networking and opt for a custom solution that is conceptually simpler and incredibly fast. By adopting Cilium and leveraging eBPF + Wireguard, network processing is now directly done in the Linux kernel with full encryption, eliminating the overhead associated with sidecar proxies. Load-balancing is still performed by Envoy proxies that are directly controlled by our control plane.

Curious to hear more? Stay tuned for our dedicated engineering blog post!

Changes and How to Use the New Networking Stack

What do you need to use the new stack? The answer is simple: nothing!
You don't need to take any action—your applications will benefit from the performance improvements automatically.

San Francisco, Washington D.C., Paris, Frankfurt, Singapore, and Tokyo - Our new networking stack is available in all Koyeb regions!

Technically, you'll see that instances can now be directly accessed inside of the private network as there is no sidecard anymore, making it easier to build clusters.

What’s Next

This is just the beginning. Our new networking stack lays the foundation for future improvements and innovations that will allow us to continue pushing the boundaries of what’s possible for cloud-native applications.

Stay tuned as we roll out even more exciting updates during Launch Week!

New Dashboard: Build, Run, and Scale Apps in Minutes with a Simple and Elegant Interface

alisdairbr — Mon, 16 Sep 2024 15:41:00 +0000

Welcome to the first day of launch week #2! Today, we are excited to introduce our brand new control panel!

Our mission at Koyeb is to offer the fastest way to deploy applications globally while delivering an exceptional developer experience. Over the past few months, we totally reimagined how to deliver a simple, reactive, and intuitive experience to deploy, manage, and scale projects to production.

Behind the scenes, the Koyeb web interface acts as a powerful control panel that lets you provision an end-to-end production environment for your workloads with continuous deployment, autoscaling, metrics, advanced networking whether on CPU or GPU.

Perhaps you’ve already noticed the changes during the last months as we’ve been adding the final touches.
So far, the initial feedback we’ve received about the new experience has been extremely positive, people love it!

So what’s new? The long and short: Our new control panel simplifies serverless deployments, provides an organized overview page to efficiently visualize and manage your Koyeb resources, and offers a simplified navigation experience.

Revamped deployment experience
New Service overview page for a summary of your Services
Overview page: Efficiently visualize and manage Koyeb resources
Enhanced navigation experience
First deployments: An easier way to get started

Revamped deployment experience

The new deployment experience in the control panel is a simple click operation. Simply select the type of service you want to deploy — service, private service, worker or database — and your deployment method — via a GitHub repository or any container registry.

Then, you can choose the Instance resource type for your workload between our line up of GPUs, Standard CPUs, and Eco CPUs and select where in the world you want to deploy them. On the map of core locations, we display latencies to your location.

Next up is a summary of your deployment. On this page, you can configure environment variables, autoscaling policies, see deployments, and an estimated cost for your deployment.

View build and deployment logs in real-time for deployments

When you’re ready to deploy, you can see the build and deployment logs. We enhanced the logs experience, so now you can see the duration and last line of the build and deployment process even when the section is collapsed. This gives you clear insight into your deployment progress.

New additions to the service summary page: Build artifacts and ability to pause and delete services

After a first deployment, the service’s setting page provides more useful actions. If you wish to redeploy with a previous build, you can opt to use the build archived from the previous deployment.

Another useful action found on the bottom of this page is the ability to pause and delete the service. Previously, these actions were tucked away in the service setting’s page.

New service overview page

Navigate through your service’s deployments effortlessly with our redesigned service overview page. It offers a seamless user experience without compromising on crucial deployment details.

Navigate across your deployments witout losing important context
Get an overview of your deployment in one click
Visualize a summary of your resources
Easier identification of your Services

Overview page: Efficiently visualize and manage Koyeb resources

From the overview page in the dashboard, you get a high-level view of your Koyeb resources, recent activity, and easy access to creating and managing all of your resources.

Easily view all of your Koyeb resources
Visualize all key information at a quick glance: Instance health status, URL, regions, and deployment source
See your Organizations recent activity
Quick access to the side navigation bar for Services, Domains, Secrets, Volumes, Teams, and creating new Services

Enhanced Navigation with New Sidebar

Our new sidebar lets you navigate the control panel. In one-click, you can:

Toggle between Koyeb Organizations
Configure Domains
Manage Secrets
Manage Volumes
See all recent activity
Add team members
Track costs at a glance to ensure optimal resource management

First deployments: An easier way to get to started

We've enhanced the first deployment experience with a revamped build and deployment logs experience, in-app guidance for troubleshooting, and clearer representation of instance types and statuses. This makes it easier than ever to get started with your initial deployments.

Instant guidance when an issue occurs

When an issue occurs during your first deployment, we provide instant guidance to help you troubleshoot and resolve the issue. This improvement is especially helpful for new users who may encounter issues during their first deployment.

Easily track progress with new build and deployment logs experience

As mentioned above, the new build and deployment logs experience allows you to easily track the progress of your deployment. This is especially useful for new users who may not be familiar with the deployment process.

Clear representation of instances and status

After a deployment, you can easily see the instance type and status of your deployment. This makes it easier to understand the resources you are using and the status of your deployment.

Future plans for the control panel

We are very excited about these milestones! And yet, there are still so many things we want to add! We’re looking forward to these enhancements, like adding support for mobile versions and making continuous improvements based on your feedback.

Let us know what you think of the new deployment experience in the Koyeb Community and on X @gokoyeb. While you're at it, show us what you’re building on the platform!

Using ComfyUI, ComfyUI Manager, and Flux to Generate High-Quality Images

alisdairbr — Tue, 03 Sep 2024 10:43:00 +0000

Recently, AI image creation has seen significant progress. Many new tools are available to make the creative process easier with flexible, powerful functionality. One of these tools, ComfyUI, has grown to prominence due to its versatile, ease of use, and high-quality image generation.

Whether you're an artist wanting to try new things or a developer wanting to use AI for visual content, ComfyUI is a powerful tool that can help solve your image-based needs. In this guide, we'll explain what ComfyUI is, how it works, and how to set it up on a Koyeb GPU.

We'll also show you how to install ComfyUI Manager to add custom modules and give a step-by-step process for making high-quality images with Flux. By the end of this guide, you'll know everything you need to get the most out of ComfyUI and create amazing visuals.

You can consult the project repository as work through this guide. You can deploy ComfyUI as configured in this tutorial using the Deploy to Koyeb button below:

What is ComfyUI?

ComfyUI is a powerful, easy-to-use tool for creating images with AI. Users can design workflows, adjust settings, and see results immediately.

Key features

User-friendly interface: ComfyUI is designed to be simple so users of all skill levels can easily navigate and use its features.
Modular workflow design: Users can create custom workflows by connecting different components, enabling flexibility in the image creation process.
Real-time feedback: ComfyUI provides instant visual feedback so users can see the effects of their adjustments and fine-tune their workflows as needed.
Integration with AI models: ComfyUI supports a variety of pre-trained AI models, as well as custom models, allowing users to generate images that meet their specific needs.

Use cases

ComfyUI is great for many different uses, such as:

Artistic creation: Artists can use ComfyUI to explore new creative ideas, make unique visuals, and try out different styles and techniques.
Graphic design: Graphic designers can use ComfyUI to quickly create design elements, make mockups, and streamline their work process.
Content creation: Marketers, bloggers, and content creators can use ComfyUI to make eye-catching visuals for their content, making it more engaging and appealing.
Research and development: AI researchers and developers can use ComfyUI to test and visualize the performance of different models, experiment with new algorithms, and push the limits of AI-driven image generation.

Basics of a ComfyUI workflow

At its core, a ComfyUI workflow is a series of connected modules, each doing a specific job in the image creation process. You can arrange these modules in different ways to get different results, giving you the flexibility to customize your workflows to fit your needs.

Key components of the workflow

Let's see the example of the default ComfyUI workflow to understand better the components:

The components of the workflow can be grouped in the following categories:

Input modules: This is the starting point of any workflow. The input module lets you set the initial settings like image size, model choice, and input data (such as sketches, text prompts, or existing images). In the example above, for instance, the Load Checkpoint and CLIP Text Encode components are input modules.
Processing modules: These parts handle various aspects of the image creation process, like applying filters, adjusting colors, refining details, and more. You can stack or arrange processing modules in parallel to create complex effects. In the example above, the KSampler component is a processing module.
Output module: These are the final stages of the workflow, where the generated image is created and saved. You can specify the output format, resolution, and other details. In the example above, the Save Image component is an output module.
Control modules: These parts give you additional control over the workflow, allowing you to tweak settings, adjust weights, and fine-tune the process. Control modules are essential for getting the desired results and ensuring high-quality outputs. In the example above, the Empty Latent Image component is a control module.

Install ComfyUI on Koyeb GPUs

If you want to use the power of cloud computing for your image generation tasks, installing ComfyUI on a Koyeb GPU is a great choice. Koyeb offers powerful GPU instances that can handle the demanding requirements of AI-driven image generation, making sure your processing is fast and efficient.

Prerequisites

Before starting the installation process, make sure you have the following:

A Koyeb account: Sign up for a Koyeb account if you don't already have one.

Create a Dockerfile

We'll start by preparing a Dockerfile so we can make sure we have all of the dependencies installed, especially for GPU support. Create a Dockerfile with the following contents:

# Use the official Python base image
FROM python:3.12

# Clone the repository
RUN git clone https://github.com/comfyanonymous/ComfyUI.git

# Set the working directory
WORKDIR /ComfyUI

# Update pip, install GPU dependencies, and install Comfy dependencies
RUN pip install --upgrade pip && pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu121 && pip install -r requirements.txt

# Set the entry point for the container
CMD python3 main.py --listen 0.0.0.0 --port ${PORT:-8188}

This Dockerfile sets up a container image for running ComfyUI. Here's what it does step-by-step:

First, it starts with a base Python image, specifically version 3.12.
Next, it downloads the ComfyUI repository from GitHub. This repository contains all the code and resources needed to run ComfyUI.
It sets the working directory to where ComfyUI was downloaded, so any subsequent commands are executed in the correct location.
Next, it updates the pip package manager to the latest version to ensure compatibility with new packages. Chained to this RUN instruction are additional pip commands that install the GPU dependencies including PyTorch, and a command that installs ComfyUI's dependencies from the included requirements.txt file.
Finally, it sets the default command to run the ComfyUI application when the container starts. This command uses 0.0.0.0 as the IP address, which means the application will be accessible from any network interface. It listens on the port defined by the PORT environment variable, using ComfyUI's default port 8188 as a fallback if the environment variable is unset.

Create the repository

The final step is to create a new repository on GitHub to store the project files.

Once you're ready, run the following commands in your terminal to commit and push your code to the repository:

echo "# ComfyUI-Flux" >> README.md
git init
git add .
git commit -m "First Commit"
git branch -M main
git remote add origin [Your GitHub repository URL]
git push -u origin main

You should now have all your local code in your remote repository. Now it is time to deploy the Dockerfile.

Deploy to Koyeb

In the Koyeb control panel, while on the Overview tab, initiate the app creation and deployment process by clicking Create App.

On the App deployment page:

Select GitHub as your deployment method.
Choose the repository where your code resides. For example, ComfyUI-Flux.
Select the GPU you wish to use, for example, RTX-4000-SFF-ADA.
In the Builder section, choose Dockerfile.
In the Service name section, choose an appropriate name.
Finally, click Deploy.

After a few minutes, your application should be available at the indicated URL.

What is ComfyUI Manager?

While ComfyUI comes with a set of pre-built modules, its real power comes from its extensibility through custom modules.

ComfyUI Manager is a plugin that lets users manage and install custom modules directly within the ComfyUI interface. It provides an easy-to-use interface for browsing available modules, installing them with a single click, and integrating them into your workflows.

Install ComfyUI Manager

To install ComfyUI Manager, the simplest way is to modify your Dockerfile to include the necessary commands to install it.

Update your Dockerfile to add an additional git clone command as follows:

# Use the official Python base image
FROM python:3.12

# Clone the repository
RUN git clone https://github.com/comfyanonymous/ComfyUI.git

# Set the working directory
WORKDIR /ComfyUI

# Update pip, install GPU dependencies, and install Comfy dependencies
RUN pip install --upgrade pip && pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu121 && pip install -r requirements.txt

# Clone ComfyUI-Manager
RUN git clone https://github.com/ltdrdata/ComfyUI-Manager.git /ComfyUI/custom_nodes/ComfyUI-Manager

# Set the entry point for the container
CMD python3 main.py --listen 0.0.0.0 --port ${PORT:-8188}

This Dockerfile sets up a container for running ComfyUI and installs also ComfyUI Manager. The only difference from the previous Dockerfile is that it clones the ComfyUI manager repository into ComfyUI's custom_nodes directory.

Once you're done, commit the changes to your repository and push them to GitHub:

git add .
git commit -m "Add ComfyUI Manager"
git push origin main

Your Koyeb application will automatically create a new deployment based on your updated Dockerfile.

Using custom modules

With ComfyUI Manager installed, you can now easily integrate custom modules into your workflows.

Whether you're looking to add new processing techniques, experiment with different models, or create entirely new workflows, the possibilities are endless.

We will use ComfyUI Manager to install Flux in the next section to generate high-quality images.

Workflow for high-quality image generation with Flux

One of the great features of ComfyUI is that it can work with advanced AI models like Flux, which are specifically made for creating high-quality images. In this section, we'll guide you through a detailed process for generating amazing images using ComfyUI and Flux.

What is Flux?

Flux is a powerful AI model made for creating high-quality images. It's one of the latest deep learning models that use large datasets, complex structures, and advanced algorithms to generate images that look realistic and are artistically impressive.

Key features of Flux

High-resolution outputs: Flux is designed to produce high-resolution images, making it perfect for tasks where detail and clarity are important, like digital art, graphic design, and print media.
Artistic flexibility: One of Flux's standout features is its ability to adapt to different artistic styles. Whether you want photorealism, impressionism, or a more abstract look, Flux can be adjusted to create images that match your desired style.
Deep learning architecture: Flux uses a deep neural network architecture, which includes multiple layers of convolutional, generative, and refinement processes. This complex architecture allows it to understand and replicate intricate patterns, textures, and lighting effects in images.
Contextual understanding: Unlike some earlier models, Flux has a strong ability to understand the context within images. This means it can accurately maintain the coherence of a scene, ensuring that elements in the image are not only visually consistent but also logically related.
Versatile input methods: Flux supports various input methods, including text prompts, sketches, and even other images. This versatility allows users to guide the image generation process more effectively, providing a high degree of creative control.

Applications of Flux

Digital art: Artists use Flux to create detailed and stylistically diverse pieces of digital art. Its ability to interpret and generate different artistic styles makes it a valuable tool for creators looking to push the boundaries of digital expression.
Graphic design: Graphic designers benefit from Flux's high-resolution outputs and its capability to generate assets that can be directly used in branding, marketing materials, and product designs.
Content creation: For content creators, Flux offers a way to quickly generate unique visuals that can enhance blog posts, social media content, and other digital media.
AI research: Researchers in the field of artificial intelligence use Flux to explore new techniques in image generation, test theories in deep learning, and develop new applications for AI in the creative industries.

Install Flux on ComfyUI with ComfyUI Manager

To install a Flux model/checkpoint on ComfyUI, you can use the ComfyUI Manager. Open the ComfyUI Manager by clicking Manager in the sidebar menu.

This will open the manager window:

From here, click on Model Manager and then search for "flux":

For this example, install the "Comfy Org/FLUX.1 [schnell]" checkpoint by click the install button.

After a couple of minutes the download and install should complete:

Now you can build your workflow to generate a high-quality image. You can use our sample workflow as a starting point. Download the file to your local computer.

To load the workflow into ComfyUI, click the Load button in the sidebar menu and select the koyeb-workflow.json file you just downloaded. Next, select the Flux checkpoint in the Load Checkpoint node and type in your prompt in the CLIP Text Encode (Prompt) node.

When you are ready, press CTRL-Enter to run the workflow and generate the image:

Here is the prompt included in the Koyeb sample workflow file followed by an example of an image generated from it:

A serene twilight scene by a calm lake surrounded by tall, evergreen pine trees. The sky is painted with soft shades of pink, orange, and purple as the sun sets in the background. A small wooden dock extends into the water, with a single lantern casting a warm glow. Gentle ripples on the lake's surface reflect the vibrant colors of the sky, while a few fireflies dance around the dock. In the distance, mist rises from the water, adding a mystical quality to the peaceful, nature-filled landscape.

Conclusion

ComfyUI is a powerful and flexible tool for making images with AI. It has a user-friendly interface, lets you design your own workflows, and can work with advanced models like Flux. Whether you are an artist, designer, or researcher, ComfyUI gives you the tools you need to create amazing visuals easily.

By following the steps in this guide, you learned how to set up ComfyUI on a Koyeb GPU, add custom modules with ComfyUI Manager, and create high-quality images using a detailed workflow. The possibilities are endless, and with ComfyUI, the only limit is your imagination.

Using YOLO for Real-Time Object Detection with Koyeb GPUs

alisdairbr — Wed, 31 Jul 2024 09:47:32 +0000

Welcome to this comprehensive guide on implementing real-time object detection using the YOLO (You Only Look Once) algorithm. This technology represents a leap forward in how we detect objects in real time, making it an invaluable tool in surveillance, robotics, and autonomous driving fields.

In this guide, we will walk through the theory behind YOLO, how it works, and how to implement it in your projects for your real-time object detection. We'll also see real-world applications of YOLO in action, showcasing its power and versatility.

You can consult the project repository as work through this guide. You can deploy the YOLO object detection application as built in this tutorial using the Deploy to Koyeb button below:

Understanding YOLO (You Only Look Once)

YOLO, an acronym for "You Only Look Once", is an innovative approach to object detection. The primary difference between YOLO and other object detection algorithms is in the way it handles object detection.

While most algorithms process an image multiple times to detect objects, YOLO uses a single pass, prompting the name "You Only Look Once".

This is a simplified breakdown of how it works:

First, the entire image is divided into a grid. Each cell in the grid evaluated individually for predicting objects within its boundaries.
For each grid cell, YOLO generates a number of bounding boxes. A bounding box is a rectangular box that can be computed to contain an object. Each bounding box comes with a confidence score, representing how certain YOLO is that the predicted box encloses some object.
Along with the bounding box, YOLO also provides the class of the object that it believes exists within the box. For example, it might claim that the object is a cat, dog, car, or any other type of object it has been trained to recognize.
After making all of the predictions, YOLO selects the bounding box with the highest confidence level and the associated object class as its final prediction.

YOLO's unique approach allows it to detect objects in real-time, making it a great choice for many computer vision tasks.

Requirements

Before diving into building the object detection application with YOLO, it's important to ensure that you have the necessary tools and knowledge. This section outlines the prerequisites you'll need to follow the guide successfully:

A Koyeb account is required to deploy.
Knowledge of Python programming to understand the scripts we will create.
A GitHub account: to store the code and trigger deployments to Koyeb GPUs.

Steps

There are several implementations of the YOLO algorithm available, but for ease-of-use, we will use the Ultralytics implementation in this guide. We will implement and test the code locally and then deploy to Koyeb's GPUs for higher inference speed.

To get started, create a project directory and then create and activate a new virtual environment within:

mkdir example-yolo
cd example-yolo
python -m venv venv
source venv/bin/activate

Now that the virtual environment and project directory are set up, make sure you have the necessary libraries and a version of YOLO installed. Visit PyTorch's getting started page to find the appropriate commands for your operating system. For this guide, we will use the Stable build, the Pip package manager, the Python language, and CUDA 12.1 as the compute platform. The commands to run will depend on your local operating system:

# Linux
pip install torch torchvision torchaudio

# Windows
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Next, install the Ultralytics YOLO library:

pip install ultralytics opencv-python

In addition to installing the YOLO implementation from Ultralytics, this also installs OpenCV for image processing.

We will be using Streamlit to build the application UI. In order to support non-local access to the webcam (when deployed to the Koyeb cloud, for instance), you will also need to install a few additional libraries:

pip install streamlit streamlit-webrtc pyarrow

You now have all the necessary dependencies installed.

Implementing YOLO for Real-Time Object Detection

Let's now create a simple Streamlit application that shows the YOLO ease of use for object detection, you can create a file called web.py:

from streamlit_webrtc import webrtc_streamer, WebRtcMode
from ultralytics import YOLO
import av

# Load the YOLOv8 model for Object Detection
model = YOLO("yolov8n.pt")

# Function to process each frame of the video stream
def process_frame(frame):
    # Read image from the frame with PyAV
    img = frame.to_ndarray(format="bgr24")

    # Run YOLOv8 tracking on the frame, persisting tracks between frames
    results = model.track(img, tracker="bytetrack.yaml")

    # Visualize the results on the frame
    annotated_frame = results[0].plot()

    # Return the annotated frame
    return av.VideoFrame.from_ndarray(annotated_frame, format="bgr24")

# Create a WebRTC video streamer with the process_frame callback
webrtc_streamer(key="streamer", video_frame_callback=process_frame, sendback_audio=False,
                media_stream_constraints={"video": True, "audio": False},
                async_processing=True,
                mode=WebRtcMode.SENDRECV,
                rtc_configuration={
                    "iceServers": [{"urls": ["stun:stun.l.google.com:19302"]}]
                }
               )

Here's a breakdown of the code:

Import the necessary libraries:
- streamlit_webrtc is used for real-time video streaming
- ultralytics is used for loading the YOLOv8 model
- av is used for handling video frames
Load the YOLOv8 model with the yolov8n.pt weights. We will use this model to detect objects in the video frames.
Define the process_frame function. This function takes a video frame as input. It converts the frame to an image using PyAV, and then applies the YOLOv8 model to detect objects in the image. The model uses the ByteTrack tracker to keep track of objects between frames. The detection results are then visualized on the frame, and the annotated frame is returned.
Call the webrtc_streamer function to create a WebRTC video streamer. We pass the process_frame function as the video_frame_callback argument so that it will be called for each frame of the video stream. The media_stream_constraints argument is used to specify that only the video should be streamed, not audio. The async_processing argument is set to True to enable processing frames in parallel with the streaming. The mode argument is set to WebRtcMode.SENDRECV, which sets the streamer to both send and receive video. We also define also an rtc_configuration so that it can use Google's free STUN servers so that it works when running in the cloud.

You can run the application locally with:

streamlit run web.py

Once the application launches, access it by navigating to http://localhost:8501/ in your browser.

The YOLOv8 model has the capability to not only detect objects in a frame but also track them across frames. This is particularly useful in applications like video surveillance, autonomous vehicles, and human-computer interaction.

For tracking objects, YOLOv8 provides two options: BoT-SORT and ByteTrack.

BoT-SORT is an extension of the original SORT (Simple Online Realtime Tracker) algorithm that also considers the motion history of objects for tracking. It uses the Kalman filter to predict the future position of objects and the Hungarian algorithm for data association. To enable BoT-SORT in YOLOv8, you need to use the botsort.yaml configuration file.
ByteTrack is a more recent tracking algorithm that is designed to handle occlusions and overlapping objects better. It uses a byte-level representation of objects and a motion model to predict the future position of objects. To enable ByteTrack in YOLOv8, use the bytetrack.yaml configuration file.

Let's expand the initial application and allow users to upload videos for processing in addition to real-time processing:

import os  # [!code ++]
import tempfile  # [!code ++]
import cv2  # [!code ++]
from streamlit_webrtc import webrtc_streamer, WebRtcMode
import streamlit as st  # [!code ++]
from ultralytics import YOLO
import av

# Load the YOLOv8 model for Object Detection
model = YOLO("yolov8n.pt")

# Function to process each frame of the video stream
def process_frame(frame):
    # Read image from the frame with PyAV
    img = frame.to_ndarray(format="bgr24")

    # Run YOLOv8 tracking on the frame, persisting tracks between frames
    results = model.track(img, tracker="bytetrack.yaml")

    # Visualize the results on the frame
    annotated_frame = results[0].plot()

    # Return the annotated frame
    return av.VideoFrame.from_ndarray(annotated_frame, format="bgr24")

# Function to process the video with OpenCV
def process_video(video_file):  # [!code ++]
    # Open the video file
    cap = cv2.VideoCapture(video_file)  # [!code ++]

    # Set the frame width and height
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)  # [!code ++]
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)  # [!code ++]

    # Loop through the video frames
    while cap.isOpened():  # [!code ++]
        # Read a frame from the video
        success, frame = cap.read()  # [!code ++]

        if success:  # [!code ++]
            # Run YOLOv8 tracking on the frame, persisting tracks between frames
            results = model.track(frame, tracker="bytetrack.yaml")  # [!code ++]

            # Visualize the results on the frame
            annotated_frame = results[0].plot()  # [!code ++]

            # Display the annotated frame
            frame = cv2.cvtColor(annotated_frame, cv2.COLOR_BGR2RGB)  # [!code ++]
            frame_placeholder.image(frame, channels="RGB")  # [!code ++]
        else:  # [!code ++]
            # Break the loop if the end of the video is reached
            break  # [!code ++]

    # Release the video capture object and close the display window
    cap.release()  # [!code ++]

# Create a WebRTC video streamer with the process_frame callback
webrtc_streamer(key="streamer", video_frame_callback=process_frame, sendback_audio=False,
                media_stream_constraints={"video": True, "audio": False},
                async_processing=True,
                mode=WebRtcMode.SENDRECV,
                rtc_configuration={
                    "iceServers": [{"urls": ["stun:stun.l.google.com:19302"]}]
                }
                )

# File upload for uploading the video file, placeholder for displaying the frames and button to start processing
file = st.file_uploader("Upload a video file", type=["mp4", "mov", "avi", "mkv"])  # [!code ++]
button = st.button("Process Video")  # [!code ++]
frame_placeholder = st.empty()  # [!code ++]
# If the button is clicked and a file is uploaded, save the file to a temporary directory and process the video
if button:  # [!code ++]
    # Save the file to a temporary directory
    temp_dir = tempfile.mkdtemp()  # [!code ++]
    path = os.path.join(temp_dir, file.name)  # [!code ++]
    with open(path, "wb") as f:  # [!code ++]
        f.write(file.getvalue())  # [!code ++]
    # Process the video
    process_video(path)  # [!code ++]

{/* prettier-ignore-end */}

Here's a breakdown of the new code:

Import the new libraries:
- cv2 is for handling video files with OpenCV
- streamlit is to create the web application
Load the YOLOv8 model with the yolov8n.pt weights. We will use this model to detect objects in the video frames.
Define the process_video function. This function takes a video file path as input and opens the video file using OpenCV. It then loops through the video frames and applies the YOLOv8 model to detect objects in each frame. It returns the frame with the detection results displayed on it.
Create a Streamlit file uploader to allow the user to upload a video file. We create a Streamlit button to start the video processing. When the button is clicked and a file is uploaded, the file is saved to a temporary directory and then processed using the process_video function.

Real-World applications of YOLO object detection

The Ultralytics YOLO documentation outlines several specific tasks that can be performed using the YOLO algorithm. Let's explore some of these tasks and their applications in real-world scenarios.

Object Counting

Object counting is the process of simply counting the number of instances of a specific object within an image or video. This task is particularly useful in scenarios like:

crowd management: Estimating the number of people in a crowded area.
inventory management: Counting products on shelves in a retail environment.
wildlife monitoring: Tracking the number of animals in a specific area for ecological studies.

Code Example

Let's see the code changes we could make to our web.py file in order to support object counting:

... updated imports ...
from ultralytics import YOLO  # [!code --]
from ultralytics import YOLO, solutions  # [!code ++]

... previous code ...

# Function to count objects in the video
def count_objects(video_file):  # [!code ++]
    # Define region points
    region_points = [(20, 150), (400, 150), (400, 350), (20, 350)]  # [!code ++]

    # Init Object Counter
    counter = solutions.ObjectCounter(  # [!code ++]
        view_img=False,  # [!code ++]
        reg_pts=region_points,  # [!code ++]
        names=model.names,  # [!code ++]
        draw_tracks=True,  # [!code ++]
        line_thickness=2,  # [!code ++]
    )  # [!code ++]

    # Open the video file
    cap = cv2.VideoCapture(video_file)  # [!code ++]

    # Loop through the video frames
    while cap.isOpened():  # [!code ++]
        # Read a frame from the video
        success, frame = cap.read()  # [!code ++]

        if success:  # [!code ++]
            # Run YOLOv8 tracking on the frame, persisting tracks between frames
            tracks = model.track(frame, persist=True, show=False)  # [!code ++]

            # Count objects in the frame
            annotated_frame = counter.start_counting(frame, tracks)  # [!code ++]

            # Display the annotated frame
            frame = cv2.cvtColor(annotated_frame, cv2.COLOR_BGR2RGB)  # [!code ++]
            frame_placeholder.image(frame, channels="RGB")  # [!code ++]

        else:  # [!code ++]
            # Break the loop if the end of the video is reached
            break  # [!code ++]

    # Release the video capture object
    cap.release()  # [!code ++]


# File upload for uploading the video file, placeholder for displaying the frames and button to start processing
... previous code ...
button_count = st.button("Count Objects")  # [!code ++]
frame_placeholder = st.empty()
# If the button is clicked and a file is uploaded, save the file to a temporary directory and process the video
if button:
    ... previous code ...

if button_count:  # [!code ++]
    # Save the file to a temporary directory
    temp_dir = tempfile.mkdtemp()  # [!code ++]
    path = os.path.join(temp_dir, file.name)  # [!code ++]
    with open(path, "wb") as f:  # [!code ++]
        f.write(file.getvalue())  # [!code ++]
    # Count objects in the video
    count_objects(path)  # [!code ++]

Here's a breakdown of the new code:

Add the solutions class to the imports from ultralytics. This class contains pre-made solutions from Ultralytics like ObjectCounter.
Define the count_objects(video_file) function. This function takes a video file as input and processes it to count objects. Inside the function, we do the following:
- Define the region_points variable. This variable represents the region of interest (ROI): the coordinates of the region in the video frame where the objects are to be counted.
- Initialize the object counter with solutions.ObjectCounter. This class is responsible for counting objects that pass through the defined region. It takes parameters such as view_img, reg_pts, names, draw_tracks, and line_thickness.
- Open the uploaded video using OpenCV's cv2.VideoCapture(video_file) function.
- Loop through the video frame-by-frame until the video is finished. In each iteration, a frame is read from the video. If the frame is successfully read, YOLOv8 tracking is applied to the frame with model.track(frame, persist=True, show=False). The counter.start_counting(frame, tracks) method is called to count objects in the frame and the annotated frame is then displayed.
- Release the video capture object with cap.release() once the video is finished.
Add a "Count Objects" button and an empty placeholder for displaying the video frames to the user interface. When the button is clicked, the video file is saved to a temporary directory, and the count_objects(path) function is called to process the video.

Let's see an example of object counting:

Object Cropping

Object cropping identifies and extracts specific objects from an image or video. This task allows you to create smaller images containing only the objects of interest, which can be useful for:

image editing: Isolating objects for graphic design purposes.
data augmentation: Creating additional training data for machine learning models.
focus enhancement: Highlighting objects in presentations or reports.

Code Example

Let's see the code changes we can make to to the web.py file in order to support object cropping:

... added imports ...
from ultralytics.utils.plotting import Annotator, colors  # [!code ++]

... previous code ...
# Function to crop objects in the video
def crop_objects(video_file):  # [!code ++]
    # Open the video file
    cap = cv2.VideoCapture(video_file)  # [!code ++]

    with frame_placeholder.container():  # [!code ++]
        # Loop through the video frames
        while cap.isOpened():  # [!code ++]
            # Read a frame from the video
            success, frame = cap.read()  # [!code ++]

            if success:  # [!code ++]
                # Run YOLOv8 tracking on the frame
                results = model.predict(frame, show=False)  # [!code ++]

                # Retrieve the bounding boxes and class labels
                boxes = results[0].boxes.xyxy.cpu().tolist()  # [!code ++]
                clss = results[0].boxes.cls.cpu().tolist()  # [!code ++]

                # Create an Annotator object for drawing bounding boxes
                annotator = Annotator(frame, line_width=2, example=model.names)  # [!code ++]

                # If boxes are detected, crop the objects and save them to a directory
                if boxes is not None:  # [!code ++]
                    # Iterate over the detected boxes and class labels
                    for box, cls in zip(boxes, clss):  # [!code ++]
                        # Draw the bounding box on the frame
                        annotator.box_label(box, color=colors(int(cls), True), label=model.names[int(cls)])  # [!code ++]

                        # Crop the object from the frame
                        annotated_frame = frame[int(box[1]): int(box[3]), int(box[0]): int(box[2])]  # [!code ++]

                        # Display the annotated frame
                        if annotated_frame.shape[0] > 0 and annotated_frame.shape[1] > 0:  # [!code ++]
                            frame = cv2.cvtColor(annotated_frame, cv2.COLOR_BGR2RGB)  # [!code ++]
                            st.image(frame, channels="RGB")  # [!code ++]
            else:  # [!code ++]
                # Break the loop if the end of the video is reached
                break  # [!code ++]

    # Release the video capture object
    cap.release()  # [!code ++]


# File upload for uploading the video file, placeholder for displaying the frames and button to start processing
... previous code ...
button_crop = st.button("Crop Objects")  # [!code ++]
frame_placeholder = st.empty()
# If the button is clicked and a file is uploaded, save the file to a temporary directory and process the video
if button:
    ... previous code ...

# If the button is clicked and a file is uploaded, save the file to a temporary directory and count the objects
if button_count:
    ... previous code ...

# If the button is clicked and a file is uploaded, save the file to a temporary directory and crop the objects
if button_crop:  # [!code ++]
    # Save the file to a temporary directory
    temp_dir = tempfile.mkdtemp()  # [!code ++]
    path = os.path.join(temp_dir, file.name)  # [!code ++]
    with open(path, "wb") as f:  # [!code ++]
        f.write(file.getvalue())  # [!code ++]
    # Count objects in the video
    crop_objects(path)  # [!code ++]

Here's a breakdown of the code:

Import the new libraries. Import the Annotator and colors classes from ultralytics.utils.plotting. These classes draw bounding boxes and retrieve colors for the bounding boxes, respectively.
Define the crop_objects(video_file) function. This function takes a video file as input and processes it to crop objects. Inside the function, we do the following:
- Open the uploaded video using OpenCV's cv2.VideoCapture(video_file) function.
- Loop through the video frame-by-frame until the video is finished. In each iteration, a frame is read from the video. If the frame is successfully read, YOLOv8 detection is applied to the frame with model.predict(frame, show=False) and the bounding boxes and class labels are retrieved.
- Crop the object. If bounding boxes are detected, an Annotator object is created and, for each bounding box, the object is cropped from the frame and displayed. The original frame with the bounding boxes is not displayed.
- Release the video capture object with cap.release() once the video is finished.
Add a "Crop Objects" button and an empty placeholder for displaying the video frames to the user interface. When the button is clicked, the video file is saved to a temporary directory, and the crop_objects(path) function is called to process the video.

Let's see an example of object cropping:

Object Blurring

Object blurring detects and blurs specific objects within an image or video. This task is essential for maintaining privacy and confidentiality in various applications, including:

privacy protection: Blurring faces or license plates in surveillance footage.
hiding sensitive information: Concealing confidential information in documents or presentations.
anonymization: Ensuring individuals' identities are not revealed in public datasets.

Code Example

Let's see the code changes we need to make to our web.py file in order to support object blurring:

... previous code ...

# Function to blur objects in the video
def blur_objects(video_file):  # [!code ++]
    # Blur ratio
    blur_ratio = 50  # [!code ++]

    # Open the video file
    cap = cv2.VideoCapture(video_file)  # [!code ++]

    # Loop through the video frames
    while cap.isOpened():  # [!code ++]
        # Read a frame from the video
        success, frame = cap.read()  # [!code ++]

        if success:  # [!code ++]
            # Run YOLOv8 tracking on the frame
            results = model.predict(frame, show=False)  # [!code ++]

            # Retrieve the bounding boxes and class labels
            boxes = results[0].boxes.xyxy.cpu().tolist()  # [!code ++]
            clss = results[0].boxes.cls.cpu().tolist()  # [!code ++]

            # Create an Annotator object for drawing bounding boxes
            annotator = Annotator(frame, line_width=2, example=model.names)  # [!code ++]

            if boxes is not None:  # [!code ++]
                for box, cls in zip(boxes, clss):  # [!code ++]
                    annotator.box_label(box, color=colors(int(cls), True), label=model.names[int(cls)])  # [!code ++]

                    obj = frame[int(box[1]): int(box[3]), int(box[0]): int(box[2])]  # [!code ++]
                    blur_obj = cv2.blur(obj, (blur_ratio, blur_ratio))  # [!code ++]

                    frame[int(box[1]): int(box[3]), int(box[0]): int(box[2])] = blur_obj  # [!code ++]

                    # Display the annotated frame
                    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)  # [!code ++]
                    frame_placeholder.image(frame, channels="RGB")  # [!code ++]
        else:  # [!code ++]
            # Break the loop if the end of the video is reached
            break  # [!code ++]

    # Release the video capture object
    cap.release()  # [!code ++]

# File upload for uploading the video file, placeholder for displaying the frames and button to start processing
... previous code ...
button_blur = st.button("Blur Objects")  # [!code ++]
frame_placeholder = st.empty()
# If the button is clicked and a file is uploaded, save the file to a temporary directory and process the video
if button:
    ... previous code ...

# If the button is clicked and a file is uploaded, save the file to a temporary directory and count the objects
if button_count:
    ... previous code ...

# If the button is clicked and a file is uploaded, save the file to a temporary directory and crop the objects
if button_crop:
    ... previous code ..

# If the button is clicked and a file is uploaded, save the file to a temporary directory and blur the objects
if button_blur:  # [!code ++]
    # Save the file to a temporary directory
    temp_dir = tempfile.mkdtemp()  # [!code ++]
    path = os.path.join(temp_dir, file.name)  # [!code ++]
    with open(path, "wb") as f:  # [!code ++]
        f.write(file.getvalue())  # [!code ++]
    # Count objects in the video
    blur_objects(path)  # [!code ++]

Here's a breakdown of the code:

Define the blur_objects(video_file) function. This function takes a video file as input and processes it to blur objects. Inside the function, we do the following:
- Define the blur_ratio variable. This variable represents the intensity of the blur effect.
- Open the uploaded video using OpenCV's cv2.VideoCapture(video_file) function.
- Loop through the video frame-by-frame until the video is finished. In each iteration, a frame is read from the video. If the frame is successfully read, YOLOv8 detection is applied to the frame with model.predict(frame, show=False) and the bounding boxes and class labels are retrieved.
- Blur the object. If bounding boxes are detected, an Annotator object is created and, for each bounding box, the object is cropped from the frame, blurred using OpenCV's cv2.blur() function, and then placed back in the original frame. The annotated frame with the blurred objects is then displayed.
- Release the video capture object with cap.release() once the video is finished.
Add a "Blur Objects" button and an empty placeholder for displaying the frames with blurred objects. When the button is clicked, the video file is saved to a temporary directory, and the blur_objects(path) function is called to process the video.

Let's see an example of object blurring:

Deploy to Koyeb's GPUs

Now that we have the application running locally, it is time to make use of Koyeb's high-performance GPUs to increase the inference processing speed.

Before we do that, we need to create a Dockerfile to install and configure the necessary code and dependencies.

First, create a requirements.txt file with versions of the packages we installed earlier that will work in the cloud:

opencv-python-headless
streamlit
streamlit-webrtc
pyarrow
torch
torchvision
torchaudio
matplotlib
numpy<2.0.0
pyyaml
scipy
seaborn
tqdm
ultralytics-thop
psutil
py-cpuinfo
contourpy
cycler
fonttools
kiwisolver
pyparsing

This list includes all of the project's requirements, including the dependencies of the ultralytics package. However, you may notice that we do not list ultralytics itself. That's because the ultralytics package depends explicitly on opencv-python, which won't work in a headless environment like Koyeb.

Instead, we install opencv-python-headless as a workable substitute along with all of the package's other dependencies. Afterwards, in the Dockerfile, we will install ultralytics separately with the --no-deps flag. In this way we can install all of the package's dependencies ahead of time, substituting opencv-python-headless for opencv-python, to avoid conflicts with ultralytics stated dependencies.

Next, create a new Dockerfile in your project directory with the following content:

# Use the official Python base image
FROM python:3.12

# Copy the requirements file
COPY requirements.txt .

# Install the dependencies
RUN pip install --upgrade pip && pip install -r requirements.txt
RUN pip install --no-deps ultralytics

# Copy the rest of the application code
COPY web.py .

# Expose port 8501
EXPOSE 8501

# Set the entry point for the container
CMD ["streamlit", "run", "web.py"]

Let's examine the Dockerfile in detail:

FROM python:3.12: This line specifies the base image for the Docker image. In this case, it's the official Python 3.12 image.
COPY requirements.txt .: This line copies the requirements.txt file from the local machine to the current working directory of the Docker image.
RUN pip install --upgrade pip && pip install -r requirements.txt: This line upgrades pip, the Python package installer, to the latest version and then uses it to install all of the Python dependencies listed in the requirements.txt file.
RUN pip install --no-deps ultralytics: This line installs ultralytics without dependencies to avoid installing opencv-python. We can do this because the requirements.txt file already manually installs all of the dependencies and provides a workable substitute for opencv-python.
COPY web.py .: This line copies the web.py file from the local machine to the Docker image in the current directory.
EXPOSE 8501: This line specifies that the Docker container will listen on port 8501.
CMD ["streamlit", "run", "web.py"]: This is the command that will be run when the Docker container starts. It starts the Streamlit app by running the web.py script.

Now your project is ready to deploy on GPUs from Koyeb. The final step is to create a repository on your GitHub account.

Download the standard Python .gitignore file from GitHub to exclude certain folders and files from being pushed to the repository:

curl -L https://raw.githubusercontent.com/github/gitignore/main/Python.gitignore -o .gitignore

Add a line to avoid committing the yolov8n.pt file that the application creates:

echo "yolov8n.pt" >> .gitignore

Now, create a new repository on GitHub and run the following commands in your terminal to commit and push your code to the repository:

echo "# Yolo Real Time" >> README.md
git init
git add .
git commit -m "First Commit"
git branch -M main
git remote add origin [Your GitHub repository URL]
git push -u origin main

All of your local code should now be present in the remote repository. You can now deploy the application to Koyeb.

In the Koyeb control panel, on the Overview tab, initiate the app creation and deployment process by clicking Create Service and then Create web service:

Select GitHub as the deployment source.
Choose the repository where your code resides. For example, YoloRealTime.
Select the GPU Instances and choose your Instance type, for example RTX-4000-SFF-ADA.
In the Builder section, select Dockerfile.
In the Exposed ports section, change the port to 8501.
Give the Service your preferred name.
Click Deploy.

During the deployment process, it may take a few minutes to build the Docker image from the Dockerfile and upload it to Koyeb's container registry. After deployment, you can access the Streamlit application using your Koyeb application's public URL. Please note that when running in the cloud, the video image from a webcam might take several of seconds to start streaming since it needs to connect with the STUN servers.

Conclusion

In this guide, we learned about real-time object detection using the YOLO (You Only Look Once) algorithm. We started by discussing areas where real-time detection is important, such as in surveillance, robotics, and self-driving cars and why its ability to process images in a single pass makes it a good tool for real-time applications.

Afterwards, we took a look at how YOLO works by explaining its grid-based prediction system and how it manages to predict bounding boxes and class probabilities at the same time. Throughout the guide we explored advanced tasks YOLO excels at like object counting, cropping, blurring, segmentation, tracking, and action recognition. These tasks show how versatile and powerful YOLO is in handling complex real-world situations.

Finally, we gave practical advice on how to use Docker to deploy YOLO, making it easy to run YOLO applications in a consistent and reproducible environment. We used this to deploy and run the application on high-performance GPUS on Koyeb. As YOLO and real-time object detection continue to improve, they will become even more useful and have more applications. The ability to capitalize on these advancements with by executing on high-performance hardware may lead to exciting breakthroughs in many industries.

DEV Community: alisdairbr

Best Open Source LLMs in 2025

DeepSeek-R1 Qwen 32B

Mistral Small 3

Qwen 2.5 Coder 7B Instruct

Best Open Source Models for Reasoning, Code Generation, and More

Fine-Tuning and Deploying Open LLMs with Serverless GPUs

Use FLUX, PyTorch, and Streamlit to Build an AI Image Generation App

Requirements

Understanding of the components

Text-To-Image Generation and FLUX Model

Streamlit

Steps

Set up the environment

Set up Streamlit

Generate AI Images with FLUX Model

Dockerize the Streamlit application

Deploy to Koyeb

Conclusion

Use Stable Diffusion and PyTorch to Build an Image Inpainting Service

Requirements

Steps

Overview of implementation of image inpaint service

Set up the project

Create and activate a virtual environment

Initialize Git repository

Install requirements

Create a .gitignore file

Implement inpaint service

Import the necessary libraries

Load the pre-trained inpainting model

Defining the inpainting function

Set up the web service

Run the Gradio interface

Dockerize the application

Copy the requirements file

Install the dependencies

Deploy to Koyeb GPU

Push the code to GitHub

Deploy the application on Koyeb

Conclusion

Fine-Tune MistralAI on Koyeb GPUs

Requirements

Steps

Cloning and Exploring the GitHub Repository

Understanding the Fine-Tuning Workflow

1. Prepare the Dataset

2. Prepare Training and Evaluation Datasets

3. Configure the Training Script

4. Verify the Dataset (Training + Evaluation)

5. Train the Model

Preparing the Financial Dataset

Preparing Training and Evaluation Datasets

Fine-Tune Llama 3.1 8B using QLORA

Requirements

Steps

Configure the local environment

Build the Apple MLX documentation from source (Optional)

Generate the training dataset

Fine-tune the model using Jupyter Notebook on Koyeb

Deploy and use the fine-tuned model on Koyeb

Conclusion

Volumes: High IOPS and Low Latency NVMe SSDs Public Preview

Getting Started with Volumes

Minio

ClickHouse

MongoDB

Neo4j

What’s Next?

AWS Regions Public Preview: Deploy on AWS in Minutes

Why AWS regions on Koyeb?

Get the power of 8+ AWS products without the hassle of manual configuration

Deploy on AWS Regions with Koyeb in Minutes

Via the Control Panel

Via the CLI

Pricing

Reduce Bandwidth Cost & Maximize Performance

Prototype with Koyeb, Deploy to Production on AWS

Seamless deployment experience on top of high-performance infrastructure

What’s next: Join over 100k developers and deploy global applications across 3 continents 🚀

Create a `.gitignore` file