Jyoti Prajapati

Posted on Dec 22

Building a High-Performance Real-Time Camera Capture System in C++

#cpp #programming #performance #systemdesign

Deep dive into building a production-ready camera capture system with zero-copy V4L2, multithreading, and FFmpeg encoding.

Ever wondered how professional surveillance systems, dashcams, or robotics vision pipelines capture and process video at real-time speeds? Let me walk you through building a production-ready camera capture system in modern C++ that does exactly that.

What We're Building
A multithreaded camera capture system that:

Captures live video at 30 FPS from USB cameras
Processes frames for AI tasks (cv::Mat for ML models)
Records continuous 30-second video segments with UTC timestamps
Maintains zero-copy performance with thread-safe architecture
Handles camera disconnects and queue overflows gracefully

Final Result: 20241222_153045.mp4, 20241222_153115.mp4, 20241222_153145.mp4... automatic loop recording!

Architecture Overview
The system uses 4 dedicated threads, each with a specific responsibility:

Why this design? Separation of concerns. Each thread has one job, making the system easy to debug, optimize, and extend.

Key Design Decisions
1. Zero-Copy with V4L2 Memory Mapping
Instead of copying frame data from kernel space to user space, we use V4L2's memory-mapped buffers:

buffers_[i].start = mmap(NULL, buf.length,
                         PROT_READ | PROT_WRITE,
                         MAP_SHARED,
                         fd_, buf.m.offset);

Why it matters: At 640x480 YUYV (614KB per frame), copying 30 frames/second means 18MB/s of unnecessary memory bandwidth.
Zero-copy eliminates this entirely.

2. Non-Blocking Capture
The capture thread uses select() with a timeout to never block the camera:

void V4L2Capturer::CaptureLoop() {
  while (running_.load()) {
    fd_set fds;
    FD_ZERO(&fds);
    FD_SET(fd_, &fds);

    timeval tv{};
    tv.tv_sec = 0;
    tv.tv_usec = 100000;  // 100ms timeout

    int r = select(fd_ + 1, &fds, NULL, NULL, &tv);

    if (r > 0) {
      // Frame available, dequeue and process
      v4l2_buffer buf{};
      ioctl(fd_, VIDIOC_DQBUF, &buf);

      // Update frame buffer
      UpdateFrame(buf);

      // Re-queue for next capture
      ioctl(fd_, VIDIOC_QBUF, &buf);
    }
  }
}

Critical point: We check the running flag every 100ms. This allows graceful shutdown while never missing frames during normal operation.

3. Smart Pointer Frame Distribution
Frames are distributed to both AI and Recording branches using shared_ptr:

// One frame, two consumers, zero pixel copies
auto frame = std::make_shared<Frame>();
frame->data = mapped_buffer_ptr;  // Points to mmap'd memory
frame->width = 640;
frame->height = 480;

ai_queue_->Push(frame);       // AI branch gets shared_ptr
recorder_queue_->Push(frame); // Recorder gets same shared_ptr
// Pixel data is never copied!

When both consumers are done, the shared_ptr automatically handles cleanup. Beautiful RAII in action!

4. Backpressure Handling
What happens when the encoder can't keep up? Bounded queues with a drop-oldest strategy:

bool FrameQueue::Push(FramePtr frame) {
  std::lock_guard<std::mutex> lock(mutex_);

  if (queue_.size() >= max_size_) {
    queue_.pop_front();  // Drop oldest frame
    dropped_frames_.fetch_add(1);

    if (dropped_frames_.load() % 100 == 0) {
      LOG_WARN("Queue full, dropped ", dropped_frames_.load(), 
               " frames total");
    }
  }

  queue_.push_back(std::move(frame));
  cv_.notify_one();
  return true;
}

Why drop oldest, not newest? In real-time systems, recent data is more valuable. This is the same strategy used in video conferencing systems.

🎬 The Recording Pipeline
H.264 Encoding with FFmpeg
We use FFmpeg's libavcodec directly (no subprocess calls):

bool VideoEncoder::OpenFile(const std::string& filename) {
  // Find H.264 encoder
  const AVCodec* codec = avcodec_find_encoder(AV_CODEC_ID_H264);

  // Configure for real-time
  codec_ctx_->time_base = AVRational{1, fps_};
  codec_ctx_->framerate = AVRational{fps_, 1};
  codec_ctx_->bit_rate = 2000000;  // 2 Mbps

  // Ultra-fast preset for low latency
  av_opt_set(codec_ctx_->priv_data, "preset", "ultrafast", 0);
  av_opt_set(codec_ctx_->priv_data, "tune", "zerolatency", 0);

  // Open encoder
  avcodec_open2(codec_ctx_, codec, nullptr);

  // Write MP4 header
  avformat_write_header(format_ctx_, nullptr);
}

30-Second Segment Rotation
The segment writer checks elapsed time and rotates files seamlessly:

void SegmentWriter::WriteLoop() {
  StartNewSegment();  // Open first file

  while (running_.load()) {
    auto now = std::chrono::system_clock::now();
    auto elapsed = std::chrono::duration_cast<std::chrono::seconds>(
        now - segment_start_time_);

    if (elapsed >= std::chrono::seconds(30)) {
      encoder_->CloseFile();      // Flush and close
      StartNewSegment();          // Open next file
    }

    // Encode next frame
    auto frame_opt = queue_->Pop(std::chrono::milliseconds(100));
    if (frame_opt) {
      encoder_->EncodeFrame(*frame_opt);
    }
  }
}

UTC Timestamps are generated like this:

std::string FormatTimestampForFilename(
    const std::chrono::system_clock::time_point& tp) {
  auto time_t_val = std::chrono::system_clock::to_time_t(tp);
  std::tm tm_val;
  gmtime_r(&time_t_val, &tm_val);  // Thread-safe UTC

  std::ostringstream oss;
  oss << std::setfill('0')
      << std::setw(4) << (tm_val.tm_year + 1900)
      << std::setw(2) << (tm_val.tm_mon + 1)
      << std::setw(2) << tm_val.tm_mday
      << "_"
      << std::setw(2) << tm_val.tm_hour
      << std::setw(2) << tm_val.tm_min
      << std::setw(2) << tm_val.tm_sec;

  return oss.str();  // "20241222_153045"
}

🧵 Thread Safety Deep Dive
Atomic Flags for Cross-Thread Communication

class V4L2Capturer {
  std::atomic<bool> running_{false};
  std::atomic<Frame*> current_frame_{nullptr};
};

// Capture thread writes
current_frame_.store(next_frame);

// Distributor thread reads
Frame* frame = current_frame_.exchange(nullptr);

Why atomics? They're lock-free and perfect for simple flags. No mutex overhead!

Condition Variables for Queue Blocking

std::optional<FramePtr> FrameQueue::Pop(
    std::chrono::milliseconds timeout) {
  std::unique_lock<std::mutex> lock(mutex_);

  // Wait for data or timeout
  if (!cv_.wait_for(lock, timeout, [this] {
    return !queue_.empty() || shutdown_.load();
  })) {
    return std::nullopt;  // Timeout
  }

  if (queue_.empty()) {
    return std::nullopt;  // Shutdown
  }

  auto frame = std::move(queue_.front());
  queue_.pop_front();
  return frame;
}

This allows threads to sleep when no work is available, saving CPU.

📊 Performance Analysis
Testing on an Intel CPU with AVX2, 8GB RAM, USB webcam:

CPU Breakdown:

Capture thread: 3-5% (mostly idle in select)
Distributor: 2-3%
AI consumer: 4-6% (YUYV→BGR conversion)
Encoder: 20-30% (H.264 encoding dominates)

🚀 Getting Started
Install Dependencies (Ubuntu)

sudo apt-get install build-essential cmake \
    libavcodec-dev libavformat-dev libavutil-dev \
    libswscale-dev libopencv-dev v4l-utils

Build and Run

bash# Get the project
git clone https://github.com/yourusername/camera-capture
cd camera-capture

# Build
./build.sh

# Run
cd build
./camera_capture

# Custom settings
./camera_capture /dev/video0 1920 1080 30 ./recordings
#                [device]    [width] [height] [fps] [dir]

Expected Output

[INFO] === Camera Capture System Starting ===
[INFO] Device: /dev/video0
[INFO] Resolution: 640x480
[INFO] FPS: 30
[INFO] Capture started at 30 FPS
[INFO] Frame distributor started
[INFO] AI consumer started
[INFO] Segment writer started
[INFO] === All systems operational ===
[INFO] Started segment #1: ./20241222_153045.mp4
[INFO] Status - AI: 150 frames | Recorder: 150 frames, 1 segments

🎓 What I Learned
1. Zero-Copy is King
Memory bandwidth is often the bottleneck in video processing. By using V4L2 memory mapping and shared_ptr distribution, we eliminated all unnecessary copies. The performance gain is massive.
2. Non-Blocking Everything
Never block the capture thread. Ever. Use select() with timeouts, bounded queues, and atomic flags. This single principle prevented countless headaches.
3. RAII Saves Lives
Every resource (file descriptors, mmap'd memory, FFmpeg contexts) uses RAII. No manual cleanup code. No memory leaks. No resource leaks.

cpp~V4L2Capturer() {
  Stop();        // Join thread
  CloseDevice(); // Unmap buffers, close fd
}
// Everything cleaned up automatically!

4. Test Edge Cases

Camera disconnect: Handle
Queue overflow: Handled
Encoder errors: Handled
Out of disk space: Handled

When to Use This Architecture
Good fit:

Security/surveillance systems
Dashcam applications
Industrial inspection
Robotics vision pipelines
Research/prototyping

Not ideal for:

Web streaming (use WebRTC instead)
Mobile apps (use platform APIs)
Cloud processing (use Kinesis/Kafka)

💭 Final Thoughts
Building a real-time video capture system taught me that performance and clean code aren't mutually exclusive. Zero-copy design, non-blocking I/O, and proper threading make the system fast. RAII, smart pointers, and clear architecture make it maintainable.
The key insight? Respect the constraints:

Cameras produce data at fixed rates
Encoders have maximum throughput
Memory bandwidth is finite
Disk I/O has latency

Design your system around these constraints, not against them.
🔗 Resources

Source Code: https://github.com/jyotiprajapati98/Camera_Capture

💬 Questions?
Drop a comment below! I'd love to hear about: