peckjon

Posted on Jul 16

Building an AI-Powered Image Search Engine with Daft.ai

#ai #datascience #tutorial #imagerecognition

Back when ML models mainly lived on self-hosted servers instead of smartphones, I spent a few years with Algorithmia, building some of the first and best ML (now "AI") hosting services. Many of my days were spent deep in the trenches with Python datascientists, churning through Jupyter notebooks, optimizing their algorithms to run in ephemeral serverless environments. Those were the days when data transformation pipelines required complex orchestration of multiple tools, custom scripts for every file format, and endless debugging of memory issues and race conditions.

Fast-forward to today: after years focused on DevOps and other areas of software development, I've been itching to get back into data science – and wow, the modern landscape is a revelation. Enter Daft: a distributed Python dataframe library designed to handle complex data workloads with the elegance of Pandas but the power to scale. What caught my attention wasn't just another dataframe library, but Daft's native support for multimodal data processing and SQL-based query capabilities. This felt like the perfect opportunity to build something practical while exploring what makes Daft exciting.

Why Daft is Worth Your Attention

Daft represents a significant step forward in data processing, especially for teams working with unstructured data. Unlike traditional dataframes that treat multimedia as mere file paths, Daft can natively decode, process, and manipulate images directly within its processing pipeline. This means you can resize thousands of images, extract features, or run ML inference – all using familiar dataframe operations that can scale across multiple cores or even distributed clusters.

Structured data gets an upgrade, too! Daft's built-in support for SQL queries works across nonrelational data, such as JSON... so those of us who grew up writing SQL92 feel just as comfortable querying a wide variety of formats.

The three Daft features that really shine in this project are:

🔍 Image Discovery & File Processing: Using daft.from_glob_path(), we can recursively discover image files across directory structures with built-in filtering by extension. No more writing custom directory traversal code or managing file system complexity.

⚡ Bulk Image Processing: Daft's native image operations let us chain .image.decode(), .image.resize(), and .image.encode() in a single pipeline. This means processing thousands of photos happens in parallel without having to manually manage Pillow operations, threading, or memory concerns.

📊 SQL Query over JSON: Once our image metadata is processed, Daft's SQL interface daft.sql() lets us write SQL queries directly over our JSON data structures, including complex operations that replace slow and cumbersome dataframe operations -- like array explosions for tag searching, and querying across multiple fields simultaneously.

Building the Demo: Where Theory Meets Practice

This image search tool demonstrates how these capabilities come together. The application discovers images in local folders, processes them through AI models for automatic captioning and tagging, then creates a searchable web interface. Here's where Daft eliminated entire categories of complexity:

No manual file system traversal – Daft's glob patterns handle recursive file discovery: image_processor.py#L40
No individual image resize operations – Daft's bulk image pipeline processes everything in parallel (no sequential Pillow operations!): image_processor.py#L135
No complex JSON parsing for search – SQL queries over structured data feel natural and performant: app.py#L95-L106
No manual parallelization – Daft handles efficient resource utilization automatically: image_processor.py#L132

The result? Clean, readable code that focuses on business logic rather than infrastructure concerns.

Development Notes & Caveats

Full transparency: while the initial code generation was aided by GitHub Copilot and Claude Sonnet 4 (you can see the original prompt in PRD.md – itself pair-generated with Copilot's help), the real work happened in the development iterations. AI tools are incredibly powerful accelerators, but they work best when guided by an experienced developer who understands the problem domain and can refine the generated solutions.

Important: This is a demo application only and should not be used unmodified in a production environment. It may contain security vulnerabilities and is optimized for simplicity and compatibility, not efficiency. For example, the BLIP model used for image captioning is a few years old and not state-of-the-art – I chose it for its reliability and broad compatibility rather than cutting-edge performance.

This project showcases only a tiny slice of Daft's capabilities. The framework supports everything from distributed computing across cloud infrastructure to advanced ML workloads with GPU acceleration. If you're dealing with large-scale data processing, multimedia pipelines, or looking to modernize your data infrastructure, there's a lot more to explore.

Ready to dive in?

🚀 Jump right into the code or read the detailed implementation guide below!

Features

🔄 Data Loader

Folder Processing: Select any local folder containing images
Recursive Discovery: Automatically finds images in all subfolders
AI-Powered Tagging: Uses BLIP model for automatic image captioning and tagging
Batch Processing: Efficiently processes large image collections using Daft.ai
Progress Tracking: Real-time updates on processing status

🔍 Image Library

Smart Search: Search images using natural language descriptions
Tag-Based Filtering: Find images by automatically generated tags
Visual Preview: Grid view with hover effects and click-to-expand
Detailed View: Modal with full image, caption, tags, and metadata
Responsive Design: Works on desktop and mobile devices

Quick Start

Prerequisites

Python 3.9 or higher
4GB+ RAM (for AI model)
Modern web browser

Installation

Clone or download this repository
Run the setup script:

   chmod +x setup.sh
   ./setup.sh

Start the application:

   source venv/bin/activate
   python app.py

Open your browser to: http://localhost:8000

Usage Guide

Processing Images

Go to the Data Loader page
Enter your image folder path (e.g., /Users/yourname/Pictures)
Click "Start Processing"
Wait for completion - the first run will download the AI model

Example folder paths:

macOS: /Users/yourname/Pictures/vacation
Linux: /home/yourname/photos
Windows: C:\\Users\\yourname\\Pictures

Searching Images

Go to the Image Library page
Enter search terms like:
- "dog" (finds images with dogs)
- "outdoor" (finds outdoor scenes)
- "person" (finds images with people)
- "mountain landscape" (finds mountain landscapes)
Click on images to see full size with details

Technical Architecture

Backend (Flask)

REST API for image processing and search
Job management for long-running processing tasks
File serving for processed images

Data Pipeline (Daft.ai)

Efficient file discovery using glob patterns
Parallel processing for image operations
Memory-efficient handling of large datasets

AI Processing

BLIP Model: Salesforce's BLIP for image captioning
Automatic Tagging: Extracts objects and scenes from captions
Standardized Output: Consistent 224x224 processed images

Data Storage

{
  "images": [
    {
      "id": "abc123",
      "filename": "photo.jpg",
      "original_path": "/full/path/photo.jpg",
      "processed_path": "photo_abc123.jpg",
      "file_size": 1024576,
      "created_date": "2025-07-14T10:30:00",
      "tags": ["outdoor", "landscape", "mountains"],
      "caption": "A beautiful mountain landscape",
      "processed_date": "2025-07-14T15:45:00"
    }
  ]
}

Supported Image Formats

JPEG/JPG
PNG
GIF
BMP
WEBP
TIFF

API Endpoints

Method	Endpoint	Description
POST	`/api/process`	Start image processing job
GET	`/api/jobs/{id}`	Get processing job status
POST	`/api/search`	Search images by text
GET	`/api/images`	Get all processed images

File Structure

daft-image-playground/
├── app.py                 # Flask application
├── image_processor.py     # Core processing logic
├── requirements.txt       # Python dependencies
├── setup.sh              # Setup script
├── LICENSE               # MIT License
├── templates/            # HTML templates
│   ├── data_loader.html
│   └── image_library.html
├── data/                 # Generated data files
├── processed_images/     # Resized images
└── README.md

Performance Notes

First Run: Model download may take 2-5 minutes
Processing Speed: Varies by hardware and image size
Memory Usage: ~2-4GB during processing
Storage: Processed images are ~50KB each (224x224 JPEG)
Model Size: ~1GB

Troubleshooting

Common Issues

"Model download failed"

Ensure internet connection for first run
Check disk space (model is ~1GB)

"Permission denied"

Ensure image folder is readable
Use absolute paths, not relative

"Out of memory"

Process smaller batches
Close other applications
Consider upgrading RAM

"No images found"

Check folder path is correct
Ensure folder contains supported image formats
Verify folder permissions

Logs and Debugging

Processing logs appear in the terminal
Check browser console for frontend errors
Job status API provides detailed error messages

Development

Interested in taking this further? A few suggestions:

Custom Image Models:

Replace BLIP model in image_processor.py
Modify tag generation logic

Search Improvements:

Add vector similarity search
Implement advanced filtering

UI Enhancements:

Add sorting options
Implement image collections

Dependencies

Daft.ai: Distributed data processing
Transformers: Hugging Face model library
Flask: Web framework
Bootstrap: UI framework

License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License allows you to:

✅ Use commercially
✅ Modify and distribute
✅ Place warranty
✅ Use patents

The only requirement is to include the original copyright notice.

Need Help? Check the troubleshooting section or create an issue on GitHub.

DEV Community