DEV Community: Dhanush Reddy

I built a local first AI CCTV assistant using Gemma 4 + Frigate

Dhanush Reddy — Sun, 24 May 2026 18:48:26 +0000

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

I Built a local-first AI CCTV assistant Using Gemma 4, Frigate, and Home Assistant

Most CCTV notifications are noisy and not actually useful.

Typical camera alerts are mostly:

Motion detected at Front Door

That tells me something happened, but not what actually happened.

Was it:

a delivery person?
someone waiting outside?
a suspicious stranger?
or just random movement?

I wanted a system that could intelligently summarize what was happening on my cameras while keeping everything completely local and privacy-friendly.

So I built a local AI-powered surveillance assistant using:

Frigate for movement detection
Home Assistant for automations
Gemma 4 running locally through Ollama
Mobile notifications for real-time summaries

Now instead of generic motion alerts, I receive notifications like:

“A person wearing a dark shirt approached the gate, waited briefly, and left.”

All processed locally without sending any footage to the cloud.

The system continuously monitors my WiFi connected CCTV cameras using Frigate. Frigate is an open-source NVR that performs real-time object detection on video feeds. You can learn more about Frigate here.

Whenever Frigate detects a person:

Relevant frames are sent to a locally running Gemma 4 model
Frigate creates an event with the video clip and the AI summary.
Home Assistant automation triggers
A rich notification is sent directly to my phone

The result is a much more useful home surveillance experience.

Instead of manually checking every alert, I can instantly understand what happened from the notification itself.

Demo

Code

You can find the YAML configuration for Frigate and Home Assistant automations in this GitHub Gist.

Update prompt and other alert / review settings as per your requirements in the Frigate config

If you run into any issues while setting it up, feel free to comment below. Your WiFi cameras should provide an RTSP stream that can be integrated into the Frigate configuration. Just refer to your camera’s documentation for instructions on retrieving the stream URL.

How I Used Gemma 4

I used the gemma4:e2b model running locally through Ollama.

The model runs inside a dedicated Proxmox container with GPU passthrough enabled for faster inference.

I specifically chose E2B variant because:

it is lightweight enough for local deployment
inference latency is low
it works well for concise visual summarization tasks

Without Gemma 4, the system would only know:

“A person was detected.”

With Gemma 4, the system understands and communicates:

what the person is doing
where they moved
how long they stayed
and whether the event is important

That dramatically improves the usefulness of camera notifications.

Getting started with Thordata

Dhanush Reddy — Mon, 27 Oct 2025 17:54:59 +0000

If you've ever tried to access information from different parts of the world or manage multiple accounts, you might have run into some roadblocks. Websites can change their content based on your location, and some might even block you. That's where a service like Thordata comes in.

What is Thordata?

Thordata is a service that provides access to a large network of residential proxies. In simple terms, it lets you use IP addresses from real devices in over 195 countries. This makes it look like your internet traffic is coming from a regular user in that location, which can be useful for a variety of tasks.

Key Features of Thordata

Here are some of the things you can do with Thordata:

Global Coverage: With over 60 million IP addresses, you can access content from all over the world.
Different Proxy Types: Thordata offers a few different kinds of proxies to fit your needs, including residential, ISP, and datacenter proxies.
Flexible Sessions: You can choose to have your IP address change with every request ("rotating" sessions) or keep the same IP for a longer period ("sticky" sessions). This is helpful for tasks that require a consistent identity.
Easy Setup: You can get started quickly with their endpoint generator, which automatically creates the credentials you need. You can also manually create users or use an IP allowlist for extra security.
Usage Tracking: The dashboard includes a statistics tab where you can monitor your data usage in real-time.

How to Use Thordata

Getting set up with Thordata is pretty straightforward. Here's a quick overview of the process:

Choose a Plan: First, you'll need to select a pricing plan that works for you.
Set Up Your Proxies: In the dashboard, you can use the endpoint generator to create your proxy credentials.
Configure Your Location and Session Type: You can choose a random IP from their global pool or select a specific country. Then, decide if you want a rotating or sticky session.
Copy and Integrate: Once you've set your preferences, you can copy the proxy details and integrate them into your application or browser.

If you're interested in giving it a try, Thordata offers a free trial to help you get started. You can use their residential proxies for tasks like web scraping, ad verification etc.

The Daily Gist: Never miss the best of Reddit again. Powered by Bright Data.

Dhanush Reddy — Mon, 01 Sep 2025 05:24:10 +0000

This is a submission for the AI Agents Challenge powered by n8n and Bright Data

What I Built

The Daily Gist is an n8n workflow that creates a 10- to 15-minute video summary of a subreddit's top posts. To do this, it uses the subreddit's RSS feed.

The following video is an example generated by my n8n workflow for the subreddit r/ArtificialInteligence

Demo

n8n Workflow

Github Gist

Technical Implementation

This workflow is a scheduled job, built with n8n, that creates a video summary of the top posts from a chosen subreddit everyday.

The workflow operates in several distinct stages:

Data Ingestion: The process begins by ingesting post URLs from the subreddit's RSS feed.
Content Scraping & Ranking: These URLs are passed to a Bright Data node, which scrapes the content of each post. The workflow then filters and ranks the posts, selecting the top 15 based on a combination of upvotes and comments.
AI-Powered Content Generation: For each of the top posts, the system utilizes two distinct AI models from Gemini:
- Visuals: An image generation model, Gemini 2.5 Flash Image, a.k.a Nano Banana, creates a unique, context-aware image for the post.
- Audio: The audio generation is a two-step process: first, the Gemini 2.5 Flash LLM creates a text-based summary. This text is then converted into speech by the specialized Gemini 2.5 Flash Preview TTS model for the final audio narration.
Video Synthesis: With a unique image and audio track for each post, an ffmpeg command is executed to sequence and merge these elements into a final, consolidated MP4 video.

In my case, the completed video is saved to a shared storage volume accessible by a Jellyfin media server running in a parallel container. This setup allows me to seamlessly stream and watch my personalized Reddit summary on any of my devices. While this is great for personal viewing, the workflow can be easily extended using n8n's built-in integrations to automatically upload the video to YouTube, send it via Telegram, or distribute it to virtually any other platform.

Bright Data Verified Node

The core data extraction was handled by Bright Data's Web Scraper node. The workflow begins by ingesting URLs from the RSS feed of the r/ArtificialInteligence subreddit. These URLs are then processed in a batch by the Bright Data node.

Scraping websites like Reddit poses a significant challenge. The content is loaded dynamically, and the HTML structure is complex, making traditional scraping with simple CSS selectors unreliable and difficult to maintain.

This is where Bright Data made it very easy. The node effortlessly extracted key information such as post titles, upvote counts, authors, and replies. The use of Bright Data node was critical, as it provides a managed solution that guarantees a structured response, eliminating the need to build and maintain a complex, fragile custom parser for Reddit's front-end.

Journey

Coming from a coding background, I was eager to see what n8n could do. The transition to n8n's visual workflow builder was seamless, and I was able to get up to speed very quickly.

The main technical challenge I encountered was a missing dependency in the standard deployment. The default self-hosted n8n image lacks FFmpeg, which was essential for my video automation workflow. I resolved this by building and publishing my own custom n8n image with FFmpeg included, which is now available for the community on GitHub: dhanushreddy291/n8n-with-ffmpeg.

This experience was incredibly valuable. It taught me that n8n's true power is the incredible speed at which you can build and deploy. Thanks to the extensive library of integrations, complex tasks that would normally require significant coding like creating custom APIs or setting up cron jobs, can be accomplished in a matter of minutes. This ability to go from idea to a functional workflow so quickly has made it my go to solution from now on.

This submission was made by Dhanush Reddy

NewsSnap: Techy+bite-sized news in minutes

Dhanush Reddy — Mon, 27 Jan 2025 07:58:19 +0000

This is a submission for the Agent.ai Challenge: Productivity-Pro Agent (See Details)

What I Built

NewsSnap empowers users to stay informed with lightning-fast, comprehensive research. It instantly scrapes and synthesizes data from across the internet to create concise, visually engaging 5-minute video summaries of topics, companies, or the past 7 days' news. Perfect for busy professionals, students, or anyone curious about the world, NewsSnap delivers reliable insights at your fingertips. Whether you're preparing for a meeting, studying, or catching up on the latest trends, NewsSnap transforms overwhelming information into clear, engaging videos—keeping you up-to-date with speed and simplicity.

To clarify, the news video is exclusively generated using information from the past 7 days from Google News, focusing solely on the specified topic or company as the context.

Demo

https://agent.ai/agent/research-byte

Here's a sample video generated by Newssnap with just the prompt: "Deepseek"

Here's another video with the prompt "Sam Altman"
https://agent.ai/agent/research-byte?rid=426f90b022b94c3b8351b0dfbd1918a2

Agent.ai Experience

While the agent builder offers an intuitive interface for basic tasks, I have encountered several challenges when developing more complex workflows. Specifically, AWS Lambda deployments have consistently failed without clear error messages, necessitating deployment to a personal server and invocation via POST requests. Additionally, I have experienced issues with FOR loops within the agent builder, and the debugging tools have not provided sufficient diagnostic information. Furthermore, the handling and formatting of JSON data requires further clarification. I have seen others using GPT-4 to parse JSON outputs, which should be done actually via code, but as there was no option to debug properly I understand that.

Despite these challenges, I recognize the platform's potential. With further development and resolution of these issues, the agent builder could empower users of all skill levels to rapidly create AI applications. It has the potential to become a central hub for AI agent orchestration, similar to Zapier, enabling agents to trigger external integrations based on their internal logic.

NewsSnap: Techy+bite-sized news in minutes

Dhanush Reddy — Mon, 27 Jan 2025 07:58:14 +0000

This is a submission for the Agent.ai Challenge: Full-Stack Agent (See Details)

What I Built

To clarify, the news video is exclusively generated using information from the past 7 days from Google News, focusing solely on the specified topic or company as the context.

Demo

https://agent.ai/agent/research-byte

Here's a sample video generated by Newssnap with just the prompt: "Deepseek"

Here's another video with the prompt "Sam Altman"
https://agent.ai/agent/research-byte?rid=426f90b022b94c3b8351b0dfbd1918a2

Agent.ai Experience

Reddit Recap: Audio summaries of subreddits powered by BrightData

Dhanush Reddy — Mon, 30 Dec 2024 06:33:50 +0000

This is a submission for the Bright Data Web Scraping Challenge: Most Creative Use of Web Data for AI Models

What I Built

Reddit Recap is an application that scrapes subreddits using BrightData and generates concise summaries every two hours. These summaries are then converted into audio briefings, all accessible through a beautiful web app, allowing users to effortlessly stay informed about their favorite communities.

Why I Built It

I wanted to tackle a personal problem I've faced: staying up-to-date with the latest discussions and news in the communities I care about. While Reddit offers an incredible wealth of discussions, the sheer volume of content became overwhelming. That's why I created Reddit Recap—a tool that distills the platform's endless stream of information into digestible, curated updates, helping me stay connected to the conversations that matter most to me.

Demo

Check out Reddit Recap here. While I've customized the current deployment to track subreddits that match my interests (r/singularity, r/LocalLLaMA, and r/homeautomation), you can easily create your own version by using the source code to monitor the communities you care about.

How I Used Bright Data

Bright Data was absolutely essential for building Reddit Recap. Scraping Reddit is incredibly challenging due to its sophisticated anti-scraping mechanisms. I leveraged BrightData's Web Scraper API for:

Reliable Data Extraction: Reddit Dataset (gd_lvz8ah06191smkebj4) provided structured and dependable access to Reddit posts, eliminating the need to build and maintain my own complex scraping infrastructure.
Bypassing Anti-Scraping Measures: Bright Data's infrastructure seamlessly handles IP blocking, CAPTCHAs, and other anti-scraping techniques that would cripple traditional scrapers. This allowed me to focus on the application's core logic.
Efficient Data Retrieval: The Bright Data API made it easy to target specific subreddits and retrieve the latest top posts in a structured format, saving significant development time.

Here is a high level architectural overview of the app

The web app can also qualify under: Prompt 1: Scrape Data from Complex, Interactive Websites

The Benefits of Reddit Recap

Reddit Recap offers several key advantages for busy individuals:

Stay Informed Effortlessly: No more endless scrolling! Get the gist of what's happening in your favorite subreddits in minutes.
Audio Summaries on the Go: Listen to your Reddit news during your commute, workout, or while doing chores.
Time Savings: Reclaim valuable time by quickly catching up on relevant discussions.
Clean and Organized Presentation: The web app provides a clear and easy-to-navigate interface for accessing the summaries.

This submission was made by Dhanush Reddy

Code

You can find complete code here, feel free to fork it and customise as per your subreddit interests

How to create custom nodes in ComfyUI

Dhanush Reddy — Sun, 28 Jul 2024 11:16:05 +0000

What is ComfyUI?

ComfyUI is a powerful and flexible user interface for Stable Diffusion, allowing users to create complex image generation workflows through a node-based system. While ComfyUI comes with a variety of built-in nodes, its true strength lies in its extensibility. Custom nodes enable users to add new functionality, integrate external services, and tailor it to their specific needs.

In this blog post, we will walk through the process of creating a custom node for image captioning using ComfyUI. This node will take an image as input and return a generated caption using an external API.

We will be using Google Gemini API for generating the caption of an image.

So here is the entire code which does the ImageCaptioning using Gemini API.

You can copy the following code into any file under the custom_nodes folder in ComfyUI, I have named mine as gemini-caption.py

Complete code for Generating Image Captions

import numpy as np
from PIL import Image
import requests
import io
import base64

class ImageCaptioningNode:
    @classmethod
    def INPUT_TYPES(s):
        return {
            "required": {"image": ("IMAGE",), "api_key": ("STRING", {"default": ""})}
        }

    RETURN_TYPES = ("STRING",)
    FUNCTION = "caption_image"
    CATEGORY = "image"
    OUTPUT_NODE = True

    def caption_image(self, image, api_key):
        # Convert the image tensor to a PIL Image
        image = Image.fromarray(
            np.clip(255.0 * image.cpu().numpy().squeeze(), 0, 255).astype(np.uint8)
        )

        # Convert the image to base64
        buffered = io.BytesIO()
        image.save(buffered, format="PNG")
        img_str = base64.b64encode(buffered.getvalue()).decode()
        api_url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key={api_key}"
        payload = {
            "contents": [
                {
                    "parts": [
                        {
                            "text": "Generate a caption for this image in as detail as possible. Don't send anything else apart from the caption."
                        },
                        {"inline_data": {"mime_type": "image/png", "data": img_str}},
                    ]
                }
            ]
        }

        # Send the request to the Gemini API
        try:
            response = requests.post(api_url, json=payload)
            response.raise_for_status()
            caption = response.json()["candidates"][0]["content"]["parts"][0]["text"]
        except requests.exceptions.RequestException as e:
            caption = f"Error: Unable to generate caption. {str(e)}"

        print(caption)
        return (caption,)


NODE_CLASS_MAPPINGS = {"ImageCaptioningNode": ImageCaptioningNode}

Here is how the node looks on the UI:

Let's go over it line by line, to get an understanding how do we go about creating a similar node for your use case. First of all whatever node you want to create, make it as a function, so you can call it just in the same way in ComfyUI, as I did here for my caption_image function.

Import the necessary libraries needed

import numpy as np
from PIL import Image
import requests
import io
import base64

These lines import the necessary libraries for my Image Captioning node:

numpy for numerical operations
PIL (Python Imaging Library) for image processing
requests for making HTTP requests to Gemini API
io for handling byte streams
base64 for encoding the image

Defining the ClassName for your ComfyUI node

class ImageCaptioningNode:
    @classmethod
    def INPUT_TYPES(s):
        return {
            "required": {"image": ("IMAGE",), "api_key": ("STRING", {"default": ""})}
        }

In my case, I have named it as ImageCaptioningNode as it does what is says.

The class method defines the input types for our node:

An "image" input of type "IMAGE"
An "api_key" input of type "STRING" with a default empty value, needed for sending API requests to Gemini API.

    RETURN_TYPES = ("STRING",)
    FUNCTION = "caption_image"
    CATEGORY = "image"
    OUTPUT_NODE = True

These class variables define:

The return type (a string)
The main function to be called ("caption_image")
The category in which the node will appear in ComfyUI
That this node can be an output node

    def caption_image(self, image, api_key):
        # Convert the image tensor to a PIL Image
        image = Image.fromarray(
            np.clip(255.0 * image.cpu().numpy().squeeze(), 0, 255).astype(np.uint8)
        )

        # Convert the image to base64
        buffered = io.BytesIO()
        image.save(buffered, format="PNG")
        img_str = base64.b64encode(buffered.getvalue()).decode()
        api_url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key={api_key}"

        # Prepare the request payload
        payload = {
            "contents": [
                {
                    "parts": [
                        {
                            "text": "Generate a caption for this image in as detail as possible. Don't send anything else apart from the caption."
                        },
                        {"inline_data": {"mime_type": "image/png", "data": img_str}},
                    ]
                }
            ]
        }
        try:
            response = requests.post(api_url, json=payload)
            response.raise_for_status()
            caption = response.json()["candidates"][0]["content"]["parts"][0]["text"]
        except requests.exceptions.RequestException as e:
            caption = f"Error: Unable to generate caption. {str(e)}"

        print(caption)
        return (caption,)

This is a standalone function which I have written that takes an Image as input, and sends it to Gemini API using the API key. The code is straightforward, we are just doing base64 encoding so image gets sent via API. We instruct Gemini to caption the image in detail using the prompt. The response from API is parsed, and printed in the console and returned as a tuple (required by ComfyUI).

NODE_CLASS_MAPPINGS = {"ImageCaptioningNode": ImageCaptioningNode}

This dictionary maps the class name to the class itself, which is used by ComfyUI to register the custom node.

To conclude your article on creating a custom ComfyUI node, you can summarize the key points and provide some final thoughts. Here's a suggested conclusion:

Conclusion:

Creating custom nodes for ComfyUI opens up a world of possibilities for extending and enhancing your image generation workflows. In this article, we've walked through the process of building a custom image captioning node, demonstrating how to:

Define input and output types
Integrate with external APIs (in this case, the Gemini API for image captioning)

By following these steps, you can create your own custom nodes to add virtually any functionality you need to ComfyUI. Whether you're integrating new LLM models, adding specialized image processing techniques, or creating shortcuts for common tasks, custom nodes allow you to tailor ComfyUI to your specific requirements.

Remember that while we've focused on image captioning in this example, the same principles can be applied to create nodes for a wide variety of tasks. The key is to understand the structure of a ComfyUI node and how to interface with the expected inputs and outputs.

In case if you still have any questions regarding this post or want to discuss something with me feel free to connect on LinkedIn or Twitter.

If you run an organization and want me to write for you, please connect with me on my Socials 🙃

Set up your own personal browser in the Cloud

Dhanush Reddy — Sun, 17 Mar 2024 08:22:35 +0000

In the age of internet surveillance and data breaches, maintaining privacy and security online has never been more critical. Thankfully, with modern cloud technologies and a bit of ingenuity, creating a private and secure browsing environment is not only possible but also quite straightforward. This article guides you through setting up your own personal browser in the cloud, leveraging a Docker container with Firefox installed on a fly.io virtual machine (VM). The process is surprisingly simple and offers numerous benefits, including enhanced security, privacy, and high-speed internet access, no matter where you are in the world.

I don't want to waste your time, just clone the following repo. Its a very simple config which sets up a docker daemon on fly VM. Offcourse you can run firefox as a standalone container image on VM, but good luck deploying it. I had a lot of errors related to s6-overlay-suexec. So what I am doing now, is just deploying this firefox container on a VM which has docker installed as a simple workaround/hack.

Offcourse I am not building the image for firefox, I will be just using the kasm firefox image from linuxserver.io instead.

Demo of what we are building

// Detect dark theme var iframe = document.getElementById('tweet-1769008521142493360-280'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1769008521142493360&theme=dark" }

Deploy

Fly.io is a platform that helps you run your apps and databases closer to your users all around the world. It takes your app code, packages it up neatly, and puts it on virtual machines that can be quickly started or stopped. This makes your app faster for users and more reliable. Fly.io is easy to use, works well for small projects or personal apps. It's a great way to make sure your app runs smoothly for people no matter where they are.

Pros of using a cloud browser

Private and Secure: Your browsing environment is isolated, reducing the risk of malware and eavesdropping.
High-Speed Internet: Utilize the high bandwidth of remote servers for faster browsing, especially beneficial for bandwidth-intensive tasks.
Easy Deployment: With my guide, deploying your cloud browser is a breeze.
Persistent Data: By mounting a volume to the VM, your browser history and data remain intact between sessions.

Deployment Steps

Configure the fly.toml File: Begin by cloning the provided repository. Edit the fly.toml file to change the app name to something unique and select your preferred region for deployment to minimize latency. I used sin for Singapore, but Fly.io offers a wide range of regions you can choose from. Chose whichever thats closest to you to decrease the latency.
Set Up Authentication: Modify the default username and password in the deploy.sh script to ensure your browser is secure.
Launch Your Cloud Browser: Run the deploy.sh script. This script will install flyctl (if not already installed), authenticate your Fly.io account, and handle the deployment of your Firefox container to your selected region. Follow the on-screen prompts during deployment.

Accessing Your Cloud Browser

Once deployment is complete, Fly.io will provide a URL for accessing your cloud browser. Navigate to this URL and log in with the username and password you set earlier. Just like that, you're ready to enjoy a secure, private browsing experience from anywhere.

Managing Costs

Running a 2 GB instance on Fly.io costs approximately $0.01476 per hour or around $10.7 per month. To manage expenses, you can stop the VM when it's not in use, as Fly.io bills to the second, allowing for precise control over your spending.

Advanced Features

Custom Domain

For a more personalized browsing experience, Fly.io supports setting up custom domains for your apps. Follow the custom domain guide for detailed instructions.

Automation with Telegram Bot

Included in the repository is a bonus feature: a Telegram bot for automating the start and stop of your VM, helping you manage usage and costs effectively. Setting up the bot takes just a few minutes, and you can control your cloud browser directly from Telegram, ensuring it's running only when you need it.

Wrapping Up

What we have setup is a DIY version of neverinstall.com, which allows running applications on remote machines. Now that you have set up a browser for yourself, you can deploy any apps you can imagine, just do a simple Google Search for KASM [NAME_OF_APP_YOU_WANT], suppose you want to deploy Ubuntu Desktop, you can search on Google "Kasm Ubuntu Docker image" and deploy that image to get a full fledged desktop in the cloud.

In case if you still have any questions regarding this post or want to discuss something with me feel free to connect on LinkedIn or Twitter.

If you run an organization and want me to write for you, please connect with me on my Socials 🙃

How to Finetune Llama 2: A Beginner's Guide

Dhanush Reddy — Mon, 11 Sep 2023 17:12:42 +0000

Meta AI's LLaMA 2 has taken the NLP community by storm with its impressive range of pretrained and fine-tuned Large Language Models (LLMs). With model sizes ranging from 7B to a staggering 70B parameters, LLaMA 2 builds upon the success of its predecessor, LLaMA 1, offering a host of enhancements that have captivated the NLP community.

The Evolution of LLaMA

LLaMA 2 signifies a significant evolution in the landscape of language models. Its expansive corpus, featuring 40% more tokens than LLaMA 1, empowers it with an extraordinary context length of up to 4000 tokens. This extended contextual understanding enables LLaMA 2 to excel in tasks that require nuanced comprehension of text and context.

What makes LLaMA 2 even more extraordinary is its accessibility. Meta AI has generously made these advanced model weights available for both research and commercial applications. This democratization of cutting-edge language models ensures that a broader audience, from researchers to businesses, can harness the power of LLaMA 2 for their unique needs.

To get access to Llama 2, you can follow these steps:

Go to the Hugging Face Model Hub: huggingface.co/meta-llama and select the model that you want to use.

Click on the "Request Access" button.
Fill out the form and Submit it.
Once your request has been approved, you will be able to download the model weights.
Here are some additional details about each size of the Llama 2 model:
- 7B parameters: This is the smallest size of the Llama 2 model. It is still a powerful model, but it is not as large as the 13B or 70B parameter models.
- 13B parameters: This is the medium-sized version of the Llama 2 model. It is a good choice for most applications.
- 70B parameters: This is the largest size of the Llama 2 model. It is the most powerful model, but it is also the most expensive to train and use.

In this blog post, I will show you how to effortlessly fine-tune the LLaMA 2 - 7B model on a subset of the CodeAlpaca-20k dataset. This dataset contains over 20,000 coding questions and their corresponding correct answers. By fine-tuning the model on this dataset, we can teach it to generate code for a variety of tasks.

In this blog post, I want to make it as simple as possible to fine-tune the LLaMA 2 - 7B model, using as little code as possible. We will be using the Alpaca Lora Training script, which automates the process of fine-tuning the model and for GPU we will be using Beam.

You can create a free account on Beam, to get started.

Prerequisites

An account on Beam
An API Key from Dashboard
Install Beam CLI by running:

curl https://raw.githubusercontent.com/slai-labs/get-beam/main/get-beam.sh -sSfL | sh

Configure Beam by entering

beam configure

Install Beam SDK

pip install beam-sdk

Now you’re ready to start using Beam to deploy your ML models.

To make it simple, I have made a Github Repo, which you can clone to start with.

In the app.py file, I use the CodeAlpaca-20k dataset.

@app.run()
def train_model():
    # Trained models will be saved to this path
    beam_volume_path = "./checkpoints"

    # Load dataset -- for this example, we'll use the sahil2801/CodeAlpaca-20k dataset hosted on Huggingface:
    # https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k
    dataset = DatasetDict()
    dataset["train"] = load_dataset("sahil2801/CodeAlpaca-20k", split="train[:20%]")

    # Adjust the training loop based on the size of the dataset
    samples = len(dataset["train"])
    val_set_size = ceil(0.1 * samples)

    train(
        base_model=base_model,
        val_set_size=val_set_size,
        data=dataset,
        output_dir=beam_volume_path,
    )

To run the training/finetuning we will be running it using the command:

beam run app.py:train_model

When we run this command, the training function will run on Beam's cloud, and we'll see the progress of the training process streamed to our terminal.

The training may take hours to complete depending on the size of the dataset you use for finetuning. In my case as I am just using 20% of the dataset, the training was completed in around 1 hour.

When the model is succesfuuly trained, we can deploy an API to run inference of our fine-tuned model.

Let's create a new function for inference. If you look closely, you'll notice that we're using a different decorator this time: rest_api instead of run.

This will allow us to deploy the function as a REST API.

@app.rest_api()
def run_inference(**inputs):
    # Inputs passed to the API
    input = inputs["input"]

    # Grab the latest checkpoint
    checkpoint = get_newest_checkpoint()

    # Initialize models
    models = load_models(checkpoint=checkpoint)

    model = models["model"]
    tokenizer = models["tokenizer"]
    prompter = models["prompter"]

    # Generate text
    response = call_model(
        input=input, model=model, tokenizer=tokenizer, prompter=prompter
    )
    return response

We can deploy this as a REST API by running this command:

beam deploy app.py:run_inference

If we navigate to the URL printed in the shell, we'll be able to copy the full cURL request to call the REST API.

Now when I tried asking "How to download an image from link in Python"

{
    "input": "How to download an image from its URL in Python?"
}

I get this entire markdown string as a response

I have prettified it below:

import urllib.request

url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/Square_logo_2008.png/1200px-Square_logo_2008.png'

with urllib.request.urlopen(url) as response:
 image = response.read()

print(image)

### Solution:

import urllib.request

url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/Square_logo_2008.png/1200px-Square_logo_2008.png'

with urllib.request.urlopen(url) as response:
 image = response.read()

print(image)


### Explanation:

The first step is to import the `urllib.request` module.

The second step is to create a `with` statement. The `with` statement is used to open a file or a connection.

The third step is to create a `urlopen` function. The `urlopen` function is used to open a URL.

The fourth step is to create a `response` variable. The `response` variable is used to store the response of the URL.

The fifth step is to create a `read` function. The `read` function is used to read the response of the URL.

The sixth step is to print the image. The `print` statement is used to print the image.

### Reflection:

- What is the difference between a `with` statement and a `try` statement?
- What is the difference between a `urlopen` function and a `request` function?
- What is the difference between a `response` variable and a `request` variable?
- What is the difference between a `read` function and a `request` function?
- What is the difference between a `print` statement and a `request` statement?
</s>

Although training the model on 20% of the dataset is not ideal, it is still a good way to get started. You can see that we are already starting to see good results with this small amount of data. If you want to get even better results, you can try fine-tuning the model on the entire dataset. This will take several hours, but it will be worth it in the end. Once the model is trained, you can use it whenever you need it.

Conclusion

In conclusion, we have seen how to fine-tune LLaMA 2 - 7B on a subset of the CodeAlpaca-20k dataset using the Alpaca Lora Training script. This script makes it easy to fine-tune the model without having to write any code.

We have also seen that even by training the model on 20% of the dataset, we can get good results. If you want to get even better results, you can try fine-tuning the model on the entire dataset.

The future of open source AI is bright. The availability of large language models like LLaMA 2 makes it possible for anyone to develop powerful AI applications. With the help of open source tools and resources, developers can fine-tune these models to meet their specific needs.

In case if you still have any questions regarding this post or want to discuss something with me feel free to connect on LinkedIn or Twitter.

If you run an organization and want me to write for you, please connect with me on my Socials 🙃

Deploy Hugging Face Models on Serverless GPU

Dhanush Reddy — Sun, 28 May 2023 10:58:31 +0000

Hugging Face is a platform and community that focuses on making artificial intelligence and data science more accessible. It aims to make AI knowledge and resources more widely available by promoting open source contributions. As AI technologies become more widely used, it’s important for advancements to be made in the field. Hugging Face provides a space for AI and data science professionals to connect and share their work.

Hugging Face provides state of the art machine learning models for different tasks. It has a vast number of pre-trained models in categories such as:

Computer Vision (Image Segmentation, Image Classification, Image Generation etc)
Natural Language Processing (Text Classification, Summarization, Generation, Translation etc)
Audio (Speech Recognition, Text to Speech etc)
and much more

Hugging Face's models are easy to use. For example, to do a text classification task, you can use the transformers library (which is part of Hugging Face), to load a pre-trained model and then use it to classify a text.

An example for text classification using transformers:

from transformers import pipeline

sentiment_analysis = pipeline(
    "sentiment-analysis", model="siebert/sentiment-roberta-large-english"
)

print(sentiment_analysis("I like Transformers"))
# [{'label': 'POSITIVE', 'score': 0.9987214207649231}]

print(sentiment_analysis("I hate React"))
# [{'label': 'NEGATIVE', 'score': 0.9993581175804138}]

We can see how easy it is to do a text classification task in just 3 lines of code using the transformers library.

Similarly, in the same way we find many models on Hugging Face with their corresponding model card.

One of the challenges of using Hugging Face models is that they can be computationally expensive to deploy as most of them require GPU's. This is because they are often large and complex, and require a lot of computing power to run. GPUs can be very expensive. For example, the hourly rate of an NVIDIA A10G instance on AWS is $1.30 for a 24 GB memory. This equates to about $2496 for a period of one month. This show how you can easily burn money on AWS 😅, for an ML model which hardly gets 1000-2000 requests in a month.

Enter Serverless GPU's

Serverless GPUs are a type of cloud computing service that provides access to powerful GPUs on demand. This means that you only pay for the time that you use the GPUs, which can save you a significant amount of money if you only need to use them occasionally.

Pay-as-You-Go Pricing: Any Serverless architectures follow a pay-as-you-go pricing model, so you only pay for the actual resource consumption during model inference.
Ease of use: Serverless GPUs are very easy to use. You don't need to worry about managing or maintaining the hardware, which can save you a lot of time and hassle.
Scalability: Serverless GPUs can be scaled up or down as needed, which makes them ideal for applications that have fluctuating workloads.

There are a number of serverless GPU providers out there, such as Banana, Replicate, Beam, Modal and many more.

I would recommend checking out all the websites, before deploying your application. All of the ones which I have mentioned do have a free limit.

Going to the previous calculation of the costs, for a 16 CPU model, with a 32 GB RAM on an A10G instance the price on Beam was just $0.00155419/second (Almost all of the providers do have an identical pricing). So let's say your API gets 1500 requests in a month and lets say an average inference time to be 1 min (60 seconds). So the cost incurred are: 0.00155419 * 60 * 1500 = $140. Almost an 18x reduction😎 in cost as compared to using AWS.

For this tutorial, I am going to use Beam to deploy dolly-v2-7b, an open source large language model from Databricks, which responds similar to ChatGPT.

Of Course feel free to use any ML model with your choice of serverless GPU provider.

Deployment

Prerequisites

An account on Beam
An API Key from Dashboard
Install Beam CLI by running:

curl https://raw.githubusercontent.com/slai-labs/get-beam/main/get-beam.sh -sSfL | sh

Configure Beam by entering

beam configure

Install Beam SDK

pip install beam-sdk

Now you’re ready to start using Beam to deploy your ML models.

As I said I will be deploying dolly-v2-7b from Databricks. The code to run it is provided on Hugging Face. So copying it into a file named as run.py

import torch
from instruct_pipeline import InstructionTextGenerationPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-7b", padding_side="left")
model = AutoModelForCausalLM.from_pretrained(
    "databricks/dolly-v2-7b", device_map="auto", torch_dtype=torch.bfloat16
)

generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)

There is one major problem with this code. In the instantiation of AutoModelForCausalLM and AutoTokenizer a cache directory is not used. So this will lead to download of the model files every time it gets invoked. Of Course this will not happen if you run the code on your own device because the cache directory by default will be somewhere inside your hard disk. But in the serverless world, everything is stateless, the model files downloaded once will not persist in the consecutive runs.

Digging through beam docs, we see an option of Shared Volumes, which means now model files can be persisted between consecutive runs and the it can be mounted in the same way as a normal hard disk whenever the model runs.

So for now lets pass a cache_dir as "./mpt_weights". The modified code is:

import torch
from instruct_pipeline import InstructionTextGenerationPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "databricks/dolly-v2-7b", padding_side="left", cache_dir=cache_path
)
model = AutoModelForCausalLM.from_pretrained(
    "databricks/dolly-v2-7b",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    cache_dir=cache_path,
)

generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)

To define the configuration of the system where our ML model runs, we need to write a Beam app definition.

I have mostly copied this code from Beam Docs. Creating a new file named app.py and adding code as:

import beam

app = beam.App(
    name="mpt-7b-chat",
    cpu=16,
    memory="32Gi",
    gpu="A10G",
    python_packages=[
        "accelerate>=0.16.0,<1",
        "transformers[torch]>=4.28.1,<5",
        "torch>=1.13.1,<2",
    ],
)

app.Trigger.RestAPI(
    inputs={"prompt": beam.Types.String()},
    outputs={"response": beam.Types.String()},
    handler="run.py:generate_text",
    keep_warm_seconds=60
)

app.Mount.SharedVolume(name="mpt_weights", path="./mpt_weights")

The above code creates a Beam application that uses 16 CPUs, 32GB of memory, and an A10G GPU. It also defines all the pip packages that would be required during the model run. The application is triggered by a REST API call and generates text in response to a prompt. The application is also loaded with models when it starts.

Now let's modify run.py one last time before we deploy it. We need to return the response from the generate_text function which we will create in run.py so that it can be returned back to the REST API.

import torch
from instruct_pipeline import InstructionTextGenerationPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer

cache_path = "./mpt_weights"

def generate_text(**inputs):
    tokenizer = AutoTokenizer.from_pretrained(
        "databricks/dolly-v2-3b", padding_side="left", cache_dir=cache_path
    )
    model = AutoModelForCausalLM.from_pretrained(
        "databricks/dolly-v2-3b",
        device_map="auto",
        torch_dtype=torch.bfloat16,
        cache_dir=cache_path,
    )
    generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)
    prompt = inputs["prompt"]
    response = generate_text(prompt)

    return {"response": response[0]["generated_text"]}

Please don't get intimidated by this code. I haven't modified much from the one on Hugging Face. I would highly suggest you first refer to respective docs before deploying anything on cloud that a random dude suggests on the internet.

Also one more thing we need instruct_pipeline.py for dolly-v2-3b as mentioned on Hugging Face. So copy paste the code from here.

So now we have 3 files, namely app.py, run.py and instruct_pipeline.py (may not be the same in your case).

Now deploy your application by entering:

beam deploy app.py

This will deploy your ML model as serverless REST API which you can use from your Frontend. Obviosly you can deploy this model as an asynchronous webhook instead of a REST API if your model inference takes a long time.

You can get the CURL command to call your app from the Beam app dashboard. Testing the app I deployed,

I got a response in under 8 seconds which is not bad for an Open source version of ChatGPT :)

Cons

Everything seems good till now with serverless but let's weigh in on the cons.

Cold starts: When a serverless GPU is not in use, it is shut down. This means that the first time you use it, it will need to be booted up, which can take a few seconds to minutes depending on your model size. This can be a problem if you are running a real-time application.
Cost: Serverless GPUs can be more expensive than traditional GPU-based solutions, especially if you are running a long-running application that almost stays active throughout the day.

Conclusion

In conclusion, we have seen how to deploy Hugging Face models on serverless GPUs. This can be a great way to get the performance benefits of GPUs without having to worry about the underlying hardware, when not in use.

In case if you still have any questions regarding this post or want to discuss something with me feel free to connect on LinkedIn or Twitter.

If you run an organization and want me to write for you, please connect with me on my Socials 🙃

Effortless Documentation of your Python Code with Github Actions and GPT3

Dhanush Reddy — Thu, 04 May 2023 13:23:24 +0000

What I built

I built a Github Actions workflow that automates the process of adding docstrings to Python functions using GPT-3. The workflow loops over all .py files and functions in each one of them, sends the function code to the GPT-3 API for analysis, and inserts the suggested docstring for the function if it does not already have one.

This solution streamlines the process of documenting Python code and saves developers time and effort by automating the task of adding docstrings to functions. By using Github Actions and GPT-3, my solution helps developers to focus on other aspects of their work while maintaining high-quality code documentation.

Category Submission:

Maintainer Must-Haves

App Link

github.com/dhanushreddy291/docstring-generator

Screenshots

Description

My project is a Github Actions workflow that automates the process of adding docstrings to Python functions using GPT-3. The solution is designed to streamline the process of documenting Python code by automatically generating docstrings for functions that do not have one.

The workflow is triggered whenever changes are pushed to the repository, and it loops over all .py files to find functions without a docstring. For each such function, the code is sent to the GPT-3 API to analyze and suggest a corresponding docstring. The functions having a docstring are ignored during the workflow. Finally the modified code is automatically commited to the Github Repo.

By automating the task of adding docstrings to Python functions, my project saves developers time and effort, allowing them to focus on other aspects of their work. The solution also helps maintain high-quality code documentation, which is essential for the long-term maintainability and scalability of any software project.

Of course, I am just scratching the surface with what is possible with Github Actions and GPT3. One can use GPT4-32K Token Model API to even build a full Markdown powered documentation by analyzing the code. At the moment, my workflow works only for Python but can be easily extended to other languages.

Link to Source Code

github.com/dhanushreddy291/docstring-generator

Permissive License

MIT

Background (What made you decide to build this particular app? What inspired you?)

As a software developer, I have often found the process of writing code documentation to be time-consuming. I knew that there had to be a better way to document code that did not require manual effort and could produce consistent and reliable results.

That's when I came across GPT-3, a powerful language model developed by OpenAI, which has been trained on vast amounts of text and code data, that can generate humanlike responses to natural language inputs. I realized that it could be leveraged to automate the process of adding docstrings to Python functions. Because why not? If Github Copilot is being used to write 46% of code, then why not do the same for code documentation.

That's why I decided to build a Github Actions workflow that uses GPT-3 to generate docstrings for Python functions automatically. My goal was to create a solution that would save developers time and effort, while also improving the quality of code documentation.

Overall, I was inspired by the idea of using GPT3 to automate a task that had traditionally been done manually. I believe that my project has the potential to transform the way that developers approach code documentation, making it easier, faster, and more reliable than ever before. Offcourse the documentation will keep on improving once we use better LLM's such as GPT4 and beyond.

How I built it (How did you utilize GitHub Actions or GitHub Codespaces? Did you learn something new along the way? Pick up a new skill?)

For this project, I utilized GitHub Actions to automate the process of adding docstrings to Python files in my repository. The workflow is triggered on a push event to the main branch of the repository.

It has the following steps:

Check out repository step is used to checkout the latest version of the repository.
Set up Python and install dependencies step is used to set up the Python environment with version 3.10 and installed the dependencies required.
Run add_docstring script step is used to execute the add_docstring.py script with the path to the file that needs to be updated.
Check for changes step is used to check if any changes have been made to the repository. If there are changes, it sets the output has_changes to true.
Commit and push changes step is used to commit and push the changes back to the main branch.

The add_docstring.py script utilizes the OpenAI GPT-3 language model to generate docstrings for the Python functions in the repository. The script is also responsible for formatting the Python files using the black and autoflake libraries.

During this project, I learned how to use GitHub Actions and integrate them into my workflow. I also learned how to use the RedBaron library to parse and manipulate Python code.

I do admit I faced a few challenges. The main problem was working with the OpenAI API. To avoid rate limiting, I had to add a time delay of 20 seconds after every request (as free trial accounts have a hard cap of 3 req/min). This was necessary to ensure that the API would not block the requests, resulting in failed workflow.

Another challenge was integrating the script with GitHub Actions. I had to learn how to create a workflow file and configure the necessary settings to ensure that the script would run automatically whenever changes were pushed to the main branch. I also had to debug some issues related to permissions for commiting back to the repo again.

Overall, the project was a great learning experience that helped me improve my skills in working with APIs and automating tasks using GitHub Actions.

Additional Resources/Info

You can find the entire source code used here

Fine-Tune GPT-3 on custom datasets with just 10 lines of code using GPT-Index

Dhanush Reddy — Sun, 12 Feb 2023 13:52:48 +0000

The Generative Pre-trained Transformer 3 (GPT-3) model by OpenAI is a state-of-the-art language model that has been trained on a massive amount of text data. GPT3 is capable of generating human-like text, performing tasks like question-answering, summarization, and even writing creative fiction. Wouldn't it be cool if you feed GPT3 with your own data source and ask it questions.

In this blog post, we'll be going to see exactly that. Fine-tuning GPT-3 on custom datasets using the GPT-Index, and do it all with just 10 lines of code! GPT-Index does the heavy lifting, by providing an high level API for connecting external knowledge bases with LLMs.

Prerequisties

You need to have Python Installed on your system.
An OpenAI API Key. If you donot have a key create a new account on openai.com/api, and get $18 of free credits.

Code

I am not going into the details of how all this is working, as this would make this blog post longer and go against the title. You can refer to gpt-index.readthedocs.io/en/latest if you need to learn more.

Create a folder and open up it in your favorite code editor. Create a virtual environment for this project if needed.
For this tutorial, we need to have gpt-index and Langchain installed. Please download the versions i mention here so to avoid any breaking changes.

pip install gpt-index==0.4.1 langchain==0.0.83

If your data sources are in form of PDF's also install PyPDF2

pip install PyPDF2==3.0.1

Now create a new file main.py and add the following code:

import os
os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'

from gpt_index import GPTSimpleVectorIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('data').load_data()
index = GPTSimpleVectorIndex(documents)

# save to disk
index.save_to_disk('index.json')

For this code to run, you need to have your datasources be it PDF's, text files etc inside of a directory named as data in the same folder. Run the code after adding data.

Your project directory should look something like this:

project/
├─ data/
│  ├─ data1.pdf
├─ query.py
├─ main.py

Now create another file named query.py and add the following code:

import os
os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'

from gpt_index import GPTSimpleVectorIndex

# load from disk
index = GPTSimpleVectorIndex.load_from_disk('index.json')

print(index.query("Any Query You have in your datasets"))

If you run this code you will be getting response from OpenAI with the query you have sent.

I have tried using this paper on Arxiv, as a datasource and asked for this query:

Conclusion

With GPT-Index, it has become much easier to work with GPT-3 and fine-tune it with just a few lines of code. I hope this small post has shown you how to get started with GPT-3 on custom datasets using GPT-Index.

Of course, you can setup a simple frontend to give it a chatbot look like ChatGPT.

In case if you still have any questions regarding this post or want to discuss something with me feel free to connect on LinkedIn or Twitter.

If you run an organization and want me to write for you, please connect with me on my Socials 🙃