DEV Community

Hagicode
Hagicode

Posted on • Originally published at docs.hagicode.com

Implementing Image Upload and AI Recognition in Chat: A Complete Solution from Design to Implementation

Implementing Image Upload and AI Recognition in Chat: A Complete Solution from Design to Implementation

In AI interaction systems, how can we enable users to upload images and have AI directly recognize them? I've actually struggled with this question for quite a while, but fortunately, I've gained some insights through the practice at HagiCode. Today, let's discuss this image upload and recognition solution—from custom protocol design to file system storage, to front-end and back-end separated preview. This serves as a complete technical note.

Background

In this era of AI chat popularity, visual information is actually an important carrier for users to express their intentions. However, most traditional chat systems only support pure text input, which prevents users from directly passing visual context to AI for analysis—a bit regrettable.

HagiCode also faced similar challenges during development: users couldn't upload images when chatting or creating main opinions, AI couldn't access users' local visual information, and there was a lack of a complete loop from image input, storage, rendering to AI context delivery.

Actually, these problems aren't a big deal, they just need some time and patience to solve. We designed and implemented a complete image upload and recognition process, enabling Claude and other AIs to directly recognize and analyze user-uploaded screenshots. Next, I'll detail the implementation of this solution.

About HagiCode

The solution shared in this article comes from our practical experience in the HagiCode project. HagiCode is an open-source AI code assistant project that uses OpenSpec-based workflow design and is committed to providing a smarter code writing experience.

Analysis

Technical Challenges

Before starting implementation, we need to first clarify the main challenges we face, after all, sharpening the axe before cutting trees doesn't delay the work.

Cross-module collaboration: Image upload involves multiple modules including frontend UI, upload service, backend API, file storage, message persistence, and AI execution mapping. Each module has its own responsibilities and interfaces, requiring a coordinated overall solution design.

Storage strategy selection: Should images be stored in the database or file system? If choosing file system, how should the directory structure be designed? How to integrate with the existing OpenSpec workflow? These all need careful consideration.

Reference protocol design: A standard image reference method is needed that can be both rendered by the frontend and correctly parsed by the AI execution pipeline. Use file paths directly? HTTP URLs? Or design a dedicated protocol?

AI capability compatibility: Different AI executors have varying degrees of multimodal support. Some executors natively support image input, while others can only process text. How to design a unified adaptation layer to ensure all executors can correctly handle image information?

Design Decisions

After thorough discussion and consideration, we made the following key design decisions.

Decision 1: File System Storage

We chose to store images in the file system rather than the database. The directory structure is designed as follows:

<system-root>/images/<sessionId>/
├── <timestamp>-<uuid>.jpg
└── <timestamp>-<uuid>.png
Enter fullscreen mode Exit fullscreen mode

The rationale is quite clear: simplify implementation, avoid database bloat, and files can be directly read by AI. Moreover, image files are essentially not suitable for storage in databases; file system is the more natural choice. It's like putting books on a bookshelf rather than stuffing them into a notebook—same principle.

Decision 2: Custom Protocol hagiimag://

To avoid conflicts with HTTP URLs while making reference semantics clearer, we designed a custom image reference protocol:

hagiimag://session-abc123/20260301-143022-a1b2c3d4
Enter fullscreen mode Exit fullscreen mode

This protocol has the format hagiimag://<sessionId>/<imageId>, with clear semantics and easy to parse and route. Seeing this format, developers can immediately understand it's an image reference, not a regular URL. Such design nuances can sometimes be quite useful.

Decision 3: Frontend Preview and AI Access Separation

During implementation, we discovered that frontend and AI have different access needs for images: the frontend needs to preview through HTTP API, while AI needs to directly read local file paths. Therefore, we designed separated access methods:

  • Frontend uses /api/Images/{sessionId}/{imageId}/content for preview
  • AI uses local file paths parsed by the server

This ensures both security (not exposing server paths) and usability (browsers can directly access). After all, security and usability always need to be balanced.

Decision 4: Immediate Upload Strategy

Another key decision is the upload timing. We chose to trigger upload immediately when the user selects or pastes an image, only referencing successfully uploaded images when sending messages.

The benefit is error handling is done upfront, avoiding complexity in the message sending API and maintaining JSON contract simplicity. Users know whether the image upload succeeded before sending, providing better experience. This "prepare for a rainy day" design approach applies in many situations.

Solution

Architecture Design

Based on the above decisions, we designed the following overall architecture:

Frontend Layer
├── ConversationInputArea  ◄─────── useImageAttachmentManager
│       │                             │
│       ├── File selection            ├── Attachment state management
│       ├── Clipboard paste           ├── Upload/retry/delete
│       └── Attachment preview        └── Image reference generation
│
Service Layer
├── ImageUploadService
│       ├── uploadImage()      ◄─────── ImagesController
│       ├── deleteImage()                 │
│       ├── parseHagiImageUrl()  ◄─────── Parse protocol links
│       └── buildPreviewUrl()              │
│
Backend Layer
├── ImagesController           ◄─────── ImagesDomainService
│       │                                  │
│       ├── POST /upload                  ├── File validation
│       ├── GET /{sessionId}/{imageId}    ├── Image saving
│       ├── DELETE                        ├── Image compression
│       └── GET /content                  └── Reference parsing
│
AI Execution Layer
├── ImageContentBlock          ◄─────── StructuredMessageDomainService
│       │                                  │
│       ├── Multimodal executor           ├── Image block parsing
│       └── Text executor fallback        └── Path hint generation
Enter fullscreen mode Exit fullscreen mode

This architecture clearly shows the complete data flow from frontend to AI. Each layer has clear responsibilities and interacts through standard interfaces. Good architecture is like this—each doing its job, not interfering with each other, smooth communication.

Key Processes

Image Upload Process:

  1. User selects images through file selection or clipboard paste
  2. Frontend validates file type and size (supports JPEG/PNG/WEBP/GIF, 10MB per file)
  3. Calls upload API, image saved to /images/{sessionId}/ directory
  4. API returns hagiimag:// reference and preview URL
  5. Frontend displays preview thumbnail in attachment bar, user can preview before sending

AI Recognition Process:

  1. User sends message containing image reference
  2. Backend parses hagiimag:// protocol link, extracts sessionId and imageId
  3. Maps image reference to ImageContentBlock
  4. Selects processing method based on executor capability:
    • Multimodal executor: passes structured image input
    • Text executor: falls back to image path hint

This completes a full loop: user uploads image → AI recognizes image → AI returns analysis results. Such smooth processes often bring better user experience.

Practice

Frontend Implementation

On the frontend, we provide a dedicated Hook to manage image attachment state:

import { useImageAttachmentManager } from '@/hooks/useImageAttachmentManager';

function ChatInput() {
  const {
    attachments,
    uploadedImages,
    hasBlockingAttachments,
    isUploading,
    selectFiles,
    removeAttachment,
    clearAttachments,
  } = useImageAttachmentManager({
    ownerId: sessionId,
    mapUploadedImage: (response) => response,
    uploadOptions: { compress: false },
  });

  const handleFileSelect = (files: File[]) => {
    selectFiles(files);
  };

  const handlePaste = (e: ClipboardEvent) => {
    const files = Array.from(e.clipboardData?.files || [])
      .filter(f => f.type.startsWith('image/'));
    if (files.length > 0) {
      handleFileSelect(files);
    }
  };

  return (
    <div>
      {/* Attachment bar */}
      {attachments.map(att => (
        <AttachmentItem
          key={att.localId}
          file={att.file}
          status={att.status}
          onRemove={() => removeAttachment(att.localId)}
        />
      ))}

      {/* Input box */}
      <textarea onPaste={handlePaste} />

      {/* Upload button */}
      <button onClick={() => fileInputRef.current?.click()}>
        Upload Image
      </button>
    </div>
  );
}
Enter fullscreen mode Exit fullscreen mode

This Hook encapsulates all attachment management logic, including upload status tracking, failure retry, attachment deletion, etc. It's very simple to use—just calling a few methods completes the entire process. Good API design is like this—simple and easy to use, yet flexible.

Parsing Custom Protocol:

// Extract sessionId and imageId from custom protocol
const parsed = parseHagiImageUrl("hagiimag://session-abc123/20260301-143022-uuid");
// Returns: { sessionId: "session-abc123", imageId: "20260301-143022-uuid" }

// Build preview URL
const previewUrl = buildPreviewUrl(parsed.sessionId, parsed.imageId);
// Returns: "/api/Images/session-abc123/20260301-143022-uuid/content"
Enter fullscreen mode Exit fullscreen mode

Through these two utility functions, the frontend can easily convert between hagiimag:// protocol and HTTP URLs. This conversion logic is encapsulated, making it much more convenient to use.

Backend Implementation

The backend uses ASP.NET Core implementation, with ImagesController and ImagesDomainService at the core:

[HttpPost("upload")]
[RequestSizeLimit(50 * 1024 * 1024)]
public async Task<ActionResult<ImageUploadResponseDto>> Upload(
    [FromForm] UploadImageFormRequest input)
{
    // 1. Validate request
    if (file == null || file.Length == 0)
        throw new UserFriendlyException("No file provided");

    // 2. Validate file type and size
    var (isValid, errorMessage) = _imagesDomainService.ValidateImage(
        file.FileName, file.ContentType, file.Length);
    if (!isValid)
        throw new UserFriendlyException(errorMessage);

    // 3. Save to file system
    await using var stream = file.OpenReadStream();
    var result = await _imagesDomainService.UploadImageAsync(
        stream,
        sessionId,
        file.FileName,
        file.ContentType,
        CurrentUserId,
        compress: input.Compress);

    // 4. Return result
    return Ok(result);
}
Enter fullscreen mode Exit fullscreen mode

This implementation follows typical Web API development patterns: validate, process, return. Note that we set a 50MB request size limit to prevent malicious large file uploads. In the online world, it's always better to be cautious.

Important Considerations

During implementation, some details need special attention:

Permission validation: Image access must verify user identity, ensuring only images from their own sessions can be accessed. This is a basic security requirement that cannot be omitted. When it comes to security, better safe than sorry.

Path security: Strictly validate sessionId and imageId to prevent path traversal attacks. For example, reject paths containing ../ to prevent users from accessing arbitrary files in the system. Handling these boundary conditions well makes the system more robust.

File cleanup: When sessions are deleted, associated images must be cleaned up synchronously to avoid orphan file accumulation. Over long operation periods, these files may occupy significant disk space. Timely cleanup is also a good habit.

Compression strategy: For screenshot-type filenames (like screenshot.png), automatically enable compression to save space. This strategy can be adjusted according to actual needs. When it comes to storage space, every bit saved helps.

Fallback handling: Executors that don't support multimodal must receive image path hints and cannot silently drop image information. This is important, otherwise users will think the AI ignored their image. User experience depends on these details.

State management: Attachments being uploaded block message sending, failed attachments allow retry or deletion. This design ensures user experience continuity. Clear state management means users won't feel confused.

Summary

Through this complete image upload and recognition solution, HagiCode achieved a full loop from user input to AI recognition. The core highlights of the entire solution include:

  • Custom hagiimag:// protocol achieves standardization of image references
  • File system storage simplifies implementation and improves performance
  • Frontend preview and AI access separation balances security and usability
  • Immediate upload strategy optimizes user experience
  • Multimodal and text fallback compatibility design ensures flexibility

This solution runs stably in HagiCode with positive user feedback. If you're also implementing similar functionality, I hope these experiences are helpful to you.

Actually, when it comes to technical solutions, there's no absolute right or wrong, only what fits or doesn't fit. Finding the path that suits your project is what's most important.

References

Original Article & License

Thanks for reading. If this article helped, consider liking, bookmarking, or sharing it.
This article was created with AI assistance and reviewed by the author before publication.

Top comments (0)