DEV Community: Jun Yamog

Trying to Impress My Better Half with ML

Jun Yamog — Mon, 26 Aug 2024 07:05:28 +0000

Learning new concepts like Machine Learning (ML) can often feel superficial. However, when you encounter these concepts in real life, it can be fun and memorable. I want to share my story of how a simple request from my better half, Tina, led me to apply these ML concepts:

Fine Tuning
Data Preparation
Hyper Parameters
Over Fitting

NOTE: Jump to "Fun Images" at the bottom if you want to skip the text.

Back story

About 6 months ago, I got curious about ML again. The first thing I tried is Stable Diffusion (SD1.5 and SDXL) using in ComfyUI. I believe to better understand ML I have to be a user of ML-enabled apps. Another way for me to understand it is to explain ML to Tina especially why it can be useful. While I was generating images, Tina casually remarked, "Well, let's see if you can make images like me". I tried a few things, face swap and control nets. It was ok but she was very unimpressed. So I left it and moved on with other boring things like RAG, Text Classification, watching more AI YouTube content until a few weeks ago.

Fine Tuning

When I started to hear about Flux.1, I tried it and I was very impressed. With Flux.1 I was able to prompt engineer to generate images without using complicated ComfyUI workflows. I tried to mimic my photos by using text-to-image and image-to-image. After watching Matt Wolfe's fine-tuning video of Flux, and also seeing Levelsio date night Flux X/Twitter posts I tried fine tuning it. The ability to fine tune meant I could use Tina's photos to train the AI to generate images to her likeness. I gathered 23 photos of Tina and started to fine tune using Low-rank adaptation (LoRA) with a Flux-dev model at replicate.com. Most of the images were just fine, but there were a few images that were very impressive. Tina was starting to get impressed, progress! She posted it on FB, some of her close friends and even my mom didn't realize it wasn't her until they read my comments.

Data Preparation

There were issues, specifically when I generated full-body images. Tina, usually nearby would see some of the images, and she would comment on images that didn’t resemble her. Some comments were hilariously blunt and not politically correct to the point that I wouldn't share them. I realized I needed better Data Preparation to generate images of Tina in varying compositions true to her likeness. From my initial 23 photos, I increased it to 57 photos. Not only did I increase the quantity, but I also got a better variety of poses and sizes. So I gave it another try and started training a version 2. Version 2 generated better images of her, including the likeness of her body, expression and poses. Tina is a frustrated fashion model, she had aspired to be one but never went for it. So one of the things she wanted to see was what she would look like as a fashion model. Not all the images were great, but some were very impressive, and it was plausible it was her. The model also picked up some inspiration from the clothes she wore and her poses, not just her facial features.

Hyper Parameters

Once Tina and I found an image with a composition we liked, I experimented on the hyper parameters. To get the image closer to what Tina would look like, I took note of the random seed of a particular image, and used the seed to give it a consistent image generation. I played around with: inference steps, guidance scale and lora scale to tweak the image to our liking. I found that increasing inference steps made usually the face closer to Tina's features. Changing guidance scale makes it closer to the prompts, while the Lora scale makes it closer to Tina's general looks. I don't think you should just max out the parameters, as that usually creates worse images. I usually put back the default or slightly higher settings and unset the random seed when generating the next set of images. I would also lower them sometimes especially the guidance scale if I accept some flexibility.

NOTE: Hyper parameters are more complicated than what I currently understand them to be. Please comment below your experience or deeper explanation of hyper parameters.

Over Fitting

One of the things in ML that can be easily overlooked is over fitting. Fine tuning the model also had the effect of skewing the images towards everyone looking like Tina. In practical terms, the model generates a world filled with Tinas, so everyone will look like her. Over fitting for image generation can be fun, but models that have bigger implication or impact overfitting can be a serious issue.

NOTE: Does anybody have a solution for this overfitting issue? I’d like to train using LoRA with more than one subject/style.

Closing Remarks

Tina is slightly impressed, but I have more work to do. What she really wants is to clone herself. I need Artificial General Intelligence (AGI) to be release, when is it coming? I need it now, so I can fine-tune Tina on it. Although, I think the safety team will likely not allow this fine-tuning to happen - for the sake of the world! :)

Fun Images

Control Nets in ComfyUI

I searched for the simplest images I had of Tina for the control net to have the best success. Here is Tina standing on the Te Paki sand dunes, wearing a plain dress with not much detail in the background. Meh results.

Fine Tuning using initial dataset

Very impressive half-body images, as I had several on my initial training photos. I also made it look as if I took them with my camera in low aperture and low light scenario. The low light also helped set the mood and hide some telltale signs of AI image generation.

Fine Tuning with better prepared data and tweaking hyper parameters

Super impressive; it even gets Tina's body shape and poses. With some tweaking of the parameters we made some images closer to her likeness.

Overfitting

Each version of Tina (member of the band) had her pose. The peace/V fingers. The hand on the hips. The side pose. Every suspect in the fine tuned world will look like Tina. Can the real Tina please step forward?

Dataset for training

The images that were used to train. This is where the model got Tina's facial features, poses and expression.

Fine-tuning LLAMA 3 for Text Classification with Limited Resources

Jun Yamog — Sat, 20 Jul 2024 07:36:05 +0000

I recently needed to classify sentences for a particular use case at work. Remembering Jeremy Howard's Lesson 4: Getting started with NLP for absolute beginners, I first adapted his notebook to fine-tune DEBERTA.

It worked, but not to my satisfaction, so I was curious what would happen if I used a LLM like LLAMA 3. The problem? Limited GPU resources. I only had access to a Tesla/Nvidia T4 instance.

Research led me to QLORA. This tutorial on Fine tuning LLama 3 LLM for Text Classification of Stock Sentiment using QLoRA was particularly useful. To better understand the tutorial, I adapted Lesson 4 into the QLORA tutorial notebook.

QLORA uses two main techniques:

Quantization: Reduces model precision, making it smaller.
LORA (Low-Rank Adaptation): Adds small, trainable layers instead of fine-tuning the whole model.

This allowed me to train LLAMA 3 8B on a 16GB VRAM T4, using about 12GB of VRAM. The results were surprisingly good, with prediction accuracy over 90%.

Confusion Matrix:
[[83  4]
[ 4  9]]
Classification Report:
              precision    recall  f1-score   support
         0.0       0.95      0.95      0.95        87
         1.0       0.69      0.69      0.69        13
    accuracy                           0.92       100
   macro avg       0.82      0.82      0.82       100
weighted avg       0.92      0.92      0.92       100
Balanced Accuracy Score: 0.8231653404067196
Accuracy Score: 0.92

Here's the iPython notebook detailing the process.

This approach shows it's possible to work with large language models on limited hardware. Working with constraints often leads to creative problem-solving and learning opportunities. In this case, the limitations pushed me to explore and implement more efficient fine-tuning techniques.

Using LangChain Expression Language (LCEL) for prompts and retrieval

Jun Yamog — Sun, 03 Mar 2024 03:16:32 +0000

In my previous post Use case for RAG and LLM my sample code only used basic string manipulation of the prompt. On this post I will show how to use LangChain Expression Language (LCEL)

Instead of string manipulation, LCEL offers a more effective alternative. Here are the step by step conversion:

Instead of using python string interpolation:

prompt = f"I need help on {context}"

use the same string without interpolation and a chat prompt template

prompt = ChatPromptTemplate.from_template("I need help on {context}")

We can directly use the vector store as a retriever within a sub-chain, simplifying the search and integration process.

retriever = vector_store.as_retriever(search_type='similarity')
context_subchain = itemgetter('user_query') | retriever

Finally combine the prompts, retriever and output processing in a chain. RunnablePassthrough is used for the user_query is supplied when the chain is invoked. itemgetter is use for llm_personality which will be substituted from a disctionary passed on the chain's invocation.

chain = (
    {
        'context': context_subchain, 
        'user_query': RunnablePassthrough(), 
        'llm_personality': itemgetter('llm_personality')
    } 
    | prompt
    | model
    | StrOutputParser()
)

Here is as sample code is written using LCEL

template_system = """

Use the following information to answer the user's query:

{context}
"""

template_user = """
User's query:

{user_query}
"""

prompt = ChatPromptTemplate.from_messages([
        SystemMessagePromptTemplate.from_template(template_system),
        HumanMessagePromptTemplate.from_template(template_user)
    ])

retriever = vector_store.as_retriever(search_type='similarity')
context_subchain = itemgetter('user_query') | retriever

chain = (
    {
        'context': context_subchain, 
        'user_query': RunnablePassthrough(), 
        'llm_personality': itemgetter('llm_personality')
    } 
    | prompt
    | model
    | StrOutputParser()
)

response = chain.invoke({**{'user_query': user_query}, **prompt_placeholders})

You can see a more complete commit diff from old string manipulation to LCEL

Use case for RAG and LLM

Jun Yamog — Thu, 01 Feb 2024 08:16:21 +0000

The Challenge

Tackling an interesting problem: given a user query, search through a PDF document and provide feedback on how well the query aligns with the document's content.

The Solution

The approach is a three-step process: Load & Index, Search & RAG, and Feedback Generation.

Load & Index

First, I need to understand the PDF document. I do this by creating a "semantic index". It's like creating a map of the document, but instead of landmarks, we have vectors.

Search & RAG

Next, I take your query and find the most related parts in the PDF document. This is where RAG (Retrieval Augmentation Generation) comes in. It's like giving the system a cheat sheet before the big test.

Feedback Generation

Finally, I generate feedback for you. This isn't just a simple "yes" or "no". I provide detailed feedback with references from the PDF document. It's like having footnotes for your query.

Dive Deeper

The code is open for you to explore. Feel free to fork it, and see how it fits your use case. I'll break down each section and highlight key points.

If you have any suggestions or improvements, don't hesitate to share. For more technical details, continue reading along this Jupyter notebook.

Reviving an old miner for machine learning

Jun Yamog — Wed, 03 Jan 2024 04:44:03 +0000

Short/TLDR version

Last year, I got an old miner (a PC with multiple GPUs) for free, and I have been learning some Machine Learning recently. I decided to revive the old miner and use it for ML. It was fun and rewarding, and it reminded me of my interest for PCs. However, I think renting a cloud computer with GPU is a better option, unless you have a free PC and want to experiment. Some cloud services, like Google’s Colab, are even free.

Background

I have been learning Machine Learning (ML) from Jeremy Howard’s FastAI's course. The course recommends using cloud computers for learning ML, as they are more convenient and powerful. However, I wanted to run my Jupyter notebooks locally, by using an old miner that I got for free. My work mate left me this miner (equipped with 4 GPUs), it was too big bring it along when he moved to another city.

1st challenge: No hard drive and OS.

The old miner did not have a hard drive and an operating system (OS). My old Linux HDD that I had as a backup failed to boot, so I bought a new cheap 2TB HDD to start with. I also wanted to learn how to boot over the network using PXE boot.

I chose Iventoy as my PXE boot server, because it seemed simple and easy to use. I had an old Thinkpad with Ubuntu installed, which I used as the host machine. I connected the Thinkpad and the miner with a cross cable. The general steps I followed were:

Download Iventoy
Download ISO and copy to ISO directory
Start Iventoy

This guide helped me to understand the basic steps

Create a PXE Boot Server for your computer network! - YouTube

I show you how to use iVentoy to setup your own PXE Network Boot server rather than having to use a USB Stick to boot your PC and install your operating syst...

youtube.com

I encountered some minor issues, such as the miner having only 4GB RAM (I later upgraded it to 16GB). The PXE boot process would copy the ISO image to the RAM, which meant that larger ISOs like Ubuntu 22.04 would not fit. I decided to try Proxmox, which had a fairly small ISO.

2nd challenge: Accessing the GPUs in virtualization.

I chose Proxmox as my OS, because it had a small ISO and it worked with PXE boot. I also wanted to have the flexibility to experiment with different virtual machines on the miner. However, this option also posed a new challenge: how to access the GPUs from the guest OS.

At first, Proxmox would not install, because it said that the hardware did not support virtualization. I thought the miner was too old for that. But after checking the Proxmox requirements and the Intel ARK specs, I realized that it should work. I just had to enable some settings in the BIOS:

Intel VT - required for hypervisors like Proxmox
VT-d - required interrupt remapping for PCI passthrough

After Proxmox was installed, I had to make the guest OS use the four GPUs exclusively. This required a PCI passthrough, which allows the guest OS to directly access the hardware devices. The following guides were very helpful:

This video also helped a lot

Double GPU Passthrough in Proxmox 8!? Play Baldur's Gate 3 and Minecraft On The Same Machine? - YouTube

Get the most out of your hardware, why not create a gaming VM, or use a GPU to accelerate tasks such as video rendering and Plex transcoding!?In this video I...

youtube.com

To verify that I got it working, I used nvidia-smi and pytorch:

3rd problem: Finding a place for the big machine.

I was so absorbed in getting the miner to work that I didn't think about where to put it. It was a big and noisy machine, and it needed a lot of power and cooling. I learned how to make all four GPUs run together, and I forked this benchmark to test them with PyTorch DDP. I was able to train a model using all four GPUs, which was very satisfying. Comment below, if you want me to write about this in detail.

My wife noticed that I was very happy with my project. She said, "You always enjoy fixing and solving things." She was very supportive of me and suggested that we find a place for the miner. We decided to get a glass coffee table and put the miner underneath it for now. It looked nice, as my daughter said, "It looks legit dad!".