Key Takeaways
- Learn how to create a custom dataset and fine-tune it for the Qwen2VL model.
- Understand the importance of fine-tuning models for real-life tasks.
Why Fine-Tuning?
You may find numerous resources explaining why fine-tuning a model is beneficial. Below are some real-world challenges that motivated me to fine-tune a model:
- Privacy Concerns: Closed-source models like ChatGPT or Gemini may not be an option due to privacy policies. Fine-tuning ensures sensitive data is not leaked externally.
- Resource Constraints: Running large models with over 7B parameters can be resource-intensive. Fine-tuning a smaller model allows similar results with reduced computational demands.
- Domain-Specific Tasks: Some tasks require customization for niche domains, and without fine-tuning, the model may fail to perform effectively.
Problem Statement
Today's challenge is to read a golf scorecard like the one below. The goal is to extract the names of players and their scores for each hole and return the data in JSON format.
Challenges:
- Handwritten Characters: Difficult to interpret.
- Non-Standardized Format: Unlike identification cards or passports, the layout lacks a fixed structure.
- Variable Image Quality: Images captured by customers vary in lighting and angles.
- Output Format: Qwen2VL is not optimized for JSON output. Moreover, ensuring deterministic results (same image yields identical results) requires additional processing.
- High Accuracy: The task demands both high precision and reasonable processing time. Fine-tuning focuses the model on specific tasks, reducing hallucination and improving relevance.
Fine-Tuning Process
Requirements:
- A GPU with at least 12 GB VRAM.
We will fine-tune the Qwen2VL 2B Instruct model to address this problem.
Steps:
1. Install Requirements
git clone https://github.com/zhangfaen/finetune-Qwen2-VL
cd finetune-Qwen2-VL
conda create --name qwen2-VL python=3.10
conda activate qwen2-VL
pip install -r requirements.txt
2. Prepare the Dataset
- Copy all images into the
train_data
folder. - Create a
train_data/data.json
file with the following structure:
[
{
"messages": [
{"role": "system", "content": "You are a helpful assistant. You will help me to output the result of golf score card, the result should be in json format, no annotation, and remember to close any brackets."},
{
"role": "user",
"content": [
{"type": "image", "image": "path_to_img_1.png"},
{"type": "text", "text": "Return the score of each player as this json format {\"golf\": [{\"name\": \"\", \"scores\": []}, {\"name\": \"\", \"scores\": []} ] }"}
]
},
{
"role": "assistant",
"content": [
{"type": "text", "text": "{\"golf\": [{\"name\": \"jason\", \"scores\": [1,2,3,4,5,6,7,8,9]}, {\"name\": \"Britney\", \"scores\": [2,4,5,7,8,9,1,2,0]} ] }"}
]
}
]
},
{
"messages": [
{"role": "system", "content": "You are a helpful assistant. You will help me to output the result of golf score card, the result should be in json format, no annotation, and remember to close any brackets."},
{
"role": "user",
"content": [
{"type": "image", "image": "path_to_img_2.png"},
{"type": "text", "text": "What are the common elements in these pictures?"}
]
},
{
"role": "assistant",
"content": [
{"type": "text", "text": "{\"golf\": [{\"name\": \"jason\", \"scores\": [1,2,3,4,5,6,7,8,9]}, {\"name\": \"Britney\", \"scores\": [2,4,5,7,8,9,1,2,0]} ] }"}
]
}
]
}
]
3. Adjust Parameters
- Modify relevant parameters in
finetune.py
(e.g.,batch_size
,padding
,model types
, etc.). - Update GPU settings in
finetune.sh
. For example:
CUDA_VISIBLE_DEVICES="0"
4. Run Fine-Tuning
./finetune.sh
Conclusion
That's it! You have now fine-tuned the Qwen2VL model for your specific task. With these steps, you can adapt the model to meet your needs while ensuring privacy and efficiency.
Happy fine-tuning!
Reference
More
If youβd like to learn more, be sure to check out my other posts and give me a like! It would mean a lot to me. Thank you.
Top comments (1)
π‘ Wow, this is awesome! I thought fine-tuning this VLM model would be so hard before reading this post. π