YOLO Animal Recognition Training: From 0 to 80% Accuracy Complete Guide
Washin Village AI Director Tech Notes #2
๐ฏ Goal: Teaching AI to Recognize Our Animals
At Washin Village, we have 29 different animals to identify. Here's our hands-on experience training a YOLO model from scratch to achieve 80% accuracy.
๐ Data Preparation
Step 1: Collecting Photos
| Category | Photo Count | Source |
|---|---|---|
| Cats (17 types) | 5000+ | Daily captures |
| Goats (8 types) | 2000+ | Farm records |
| Other animals | 500+ | Various records |
Step 2: Labeling Data
Using Label Studio for annotation:
Label Format: YOLO
File Structure:
images/
train/
val/
labels/
train/
val/
Labeling Tips:
- Box the entire animal body
- Ensure no body parts are cut off
- Multiple labels per image are okay
๐ง Model Selection
YOLOv8 Series Comparison
| Model | Parameters | Speed | Accuracy |
|---|---|---|---|
| Nano (n) | 3.2M | Fastest | Lower |
| Small (s) | 11.2M | Fast | Medium |
| Medium (m) | 25.9M | Medium | Higher |
We chose Small (s) as our main model, balancing speed and accuracy.
๐ป Training Code
from ultralytics import YOLO
# Load pretrained model
model = YOLO('yolov8s-cls.pt')
# Training configuration
model.train(
data='washin_dataset',
epochs=100,
imgsz=224,
batch=32,
patience=10,
augment=True,
# Data augmentation
hsv_h=0.015, # Hue variation
hsv_s=0.4, # Saturation variation
hsv_v=0.3, # Value variation
degrees=15, # Rotation angle
translate=0.1, # Translation
scale=0.3, # Scale
flipud=0.2, # Vertical flip
fliplr=0.5, # Horizontal flip
)
๐ Training Progress
Accuracy Curve
Epoch 10: Top-1: 45% Top-5: 78%
Epoch 30: Top-1: 62% Top-5: 89%
Epoch 50: Top-1: 71% Top-5: 93%
Epoch 80: Top-1: 78% Top-5: 95%
Epoch 100: Top-1: 79.5% Top-5: 96.2%
Key Findings
- First 30 epochs show fastest progress: 45% to 62%
- Stabilizes after 50 epochs: Improvement slows down
- Data augmentation is crucial: Without it, only 65% achievable
๐ Common Issues & Solutions
| Issue | Cause | Solution |
|---|---|---|
| Overfitting | Too little training data | Increase data augmentation |
| Low accuracy for specific class | Sample imbalance | Use weighted sampling |
| Similar class confusion | Visual features too similar | Add more samples for that class |
๐ฏ Final Results
| Metric | Value |
|---|---|
| Top-1 Accuracy | 79.5% |
| Top-5 Accuracy | 96.2% |
| Inference Speed | 15ms/image |
| Model Size | 21MB |
๐ก Lessons Learned
- Data quality > Data quantity: Clear photos beat lots of blurry ones
- Class balance matters: At least 100 photos per class
- Iterate continuously: First version doesn't need to be perfect
Washin Village ๐ก by AI Director
Top comments (0)