DEV Community

Vishwajeet Sharma
Vishwajeet Sharma

Posted on

Introducing ScreenAI: An Open-Source AI Assistant for Chrome with Screen OCR

I am introducing ScreenAI, a new, open-source AI assistant for Google Chrome.

The goal of this project was to create a "Bring-Your-Own-Key" (BYOK) tool that integrates powerful AI capabilities directly into the browser. This model allows users to leverage their own Google AI and OCR.space API keys, bypassing subscriptions and maintaining control over their data.

Here is a look at the interface:

ScreenAI Screenshot

Core Features

ScreenAI integrates several AI-driven functions to enhance productivity:

  • Screenshot OCR: A built-in snipping tool allows you to capture any region of your screen. The tool extracts all text from that region, which can then be processed by the AI.
  • Image Analysis: The extension can analyze images. You can right-click any image on the web or upload/paste an image directly into the chat modal for analysis.
  • Contextual Text Analysis: Activate the assistant with Ctrl+Shift+X or by right-clicking selected text to get AI-powered insights on your current selection.

Project Architecture

The extension operates on an event-driven architecture designed for security and efficiency.

  • manifest.json: Defines the core permissions, registers the Ctrl+Shift+X keyboard shortcut, and establishes the background service worker.
  • background.js (Service Worker): This is the central controller. It securely manages all external API calls to Google AI and OCR.space, listens for all browser events, and handles the injection of other scripts.
  • content.js: This script is injected into the active tab to render the draggable AI modal, manage the chat interface, and display results from the background script.
  • snipper.js: A lightweight, single-purpose script injected on-demand. It creates the screen capture overlay and sends the selected region's coordinates to the background service worker for processing.

All text-based prompts and conversations are routed to the gemini-2.5-flash model for fast, responsive interaction.

Links

I have released this project as open-source. Feedback and contributions are welcome.

Thank you for your time. I hope you find this tool useful.

Top comments (0)