DEV Community: somyabhalani

ananta memory

somyabhalani — Sat, 27 Jun 2026 12:16:23 +0000

I Built a Photographic Memory for My Browser (Because Standard History is Broken)
Have you ever spent twenty minutes frantically Googling variations of a phrase, trying to find an incredible article or documentation page you read just yesterday?

Your browser history is a simple list of URLs and page titles. But human memory doesn't work like that. We remember concepts, specific phrases, and paragraphs. We remember what we read, not the URL we read it on.

I got tired of losing great resources, so I built a solution: Ananta Memory.

Ananta Memory is a Chrome extension that acts as a photographic memory for your browser. It quietly saves the actual text of everything you read into a secure, completely offline database.

Here is why I built it, and why I chose a strict "local-first" architecture to do it.

The Problem: The Cloud is Watching
When I first came up with the idea for a tool that saves everything you read, my immediate thought was to build a standard SaaS app. I'd spin up a PostgreSQL database, use a cloud provider, and sync user data via an API.

But then I thought about the privacy implications.

A tool that reads and saves every article, blog post, and forum thread you visit is essentially a keylogger for your reading habits. Sending that data to a cloud server—even an encrypted one—felt like a massive violation of user trust.

I didn't want to build a surveillance tool. I wanted to build a personal memory assistant.

The Solution: Local-First Architecture
To solve this, I completely reversed the architecture. Instead of sending data to the cloud, Ananta Memory brings the cloud to your browser.

Ananta Memory is built with a strict local-first philosophy. When you browse the web, the extension extracts the readable text from the page and saves it directly to a database located physically on your hard drive using Chrome's local storage APIs.

What this means for you:
Zero Latency: Because everything is stored locally, searching your memory is instantaneous. There are no network requests, no loading spinners, and no API rate limits.
True Privacy: Your data never leaves your device. It is never sent to Ananta Labs, it is never sold to advertisers, and it cannot be breached from a cloud server.
Offline Access: Even if you lose your internet connection, your entire reading history is fully searchable and accessible.
The Privacy Blacklist
Even with local storage, some things should simply never be recorded.

I engineered Ananta Memory with a built-in privacy blacklist. The extension automatically detects and disables itself on:

Banking and financial websites
Payment gateways (Stripe, PayPal)
Authentication portals and login screens
Your private data remains completely untouched.

The Result
The result is a tool that feels like magic. When I'm coding and I vaguely remember a specific paragraph from the React documentation, I don't go to Google. I open Ananta Memory, type the phrase I remember, and the extension instantly pulls up the exact paragraph I was reading days ago.

It's completely changed the way I research, read, and work on the web.

Try It Out
If you're tired of losing track of your research, you can download the extension package for free right now. Because it's a powerful developer tool, it requires a quick 30-second manual installation via Chrome's Developer Mode.

Check it out here: [Insert Your Website Link Here]

I'd love to hear your feedback, feature requests, or thoughts on local-first web architecture in the comments below!

Built by Somya Bhalani @ Ananta Labs AI

Ananta Memory | Your Browser's Photographic Memory by Ananta Labs AI

A stunning extension that quietly saves everything you read into a secure, completely offline database. Search your past instantly.

ananta-extension.vercel.app

Tile Extractor

somyabhalani — Sat, 23 May 2026 15:03:13 +0000

Parsing the Unparsable: Building a Layout-Aware Computer Vision Pipeline for 50,000+ Stone SKUs

Executive Summary

The stone and marble industry operates on visual catalogs. Manufacturers publish hundreds of pages of PDF catalogs showing marble slabs, tile patterns, texture variations, and dimension tables. For digital inventory platforms and wholesalers, extracting these products to populate databases is a massive bottleneck.

Standard OCR (Optical Character Recognition) tools fail immediately because these catalogs are highly visual, containing complex grid structures where product images are loosely aligned with text descriptions, dimensions, and SKU codes. Ananta Labs was hired to design a layout-aware computer vision and text parsing pipeline that could ingest multi-page catalogs, segment individual product tiles, extract their corresponding text details, and output clean, database-ready JSON arrays. The target was 95%+ accuracy over a database of 50,000+ unique marble and stone SKUs.

The Architecture: Segmentation-First Parsing

Traditional text extraction tools parse documents top-to-bottom, left-to-right. In a product catalog, this approach merges the text of Slab A with the dimensions of Slab B.

To prevent data mismatch, we implemented a segmentation-first approach. Instead of reading the document as text, we treat each catalog page as an image canvas, locate the individual physical grid cells (tiles), isolate them, and then run OCR within the boundaries of each isolated cell.

Project Metrics & Impact

Throughput: Processing a standard 100-page catalog (containing roughly 1,200 product variations) took less than 180 seconds.
Accuracy: Out of 50,000+ processed stone tiles, our layout segmentation maintained an extraction accuracy of 96.4%.
Human Verification: Reduced manual data entry time by 94%, shifting the operator's role from manual transcription to simply reviewing a clean, visual admin UI validation screen.

Step 1: Document Rasterization and Pre-processing

We use PyMuPDF to rasterize incoming PDF pages into high-resolution PNG images (300 DPI) to ensure fine print text is highly legible. The document is converted page-by-page, and zoomed in to optimize the text characters before OCR processing occurs.

Step 2: Contour Detection & Grid Cell Isolation

Catalog pages usually group slab images and SKU data inside visual grid cells or boxes. We use computer vision (OpenCV) to detect these bounding boxes:

Binarization: Convert the page image to grayscale and apply adaptive thresholding to isolate boundaries.
Morphological Operations: Apply vertical and horizontal kernels to detect solid horizontal and vertical grid lines, creating a clean binary mask of the catalog layout.
Contour Extraction: Find contours on the grid mask and filter out shapes that are too small (noise) or too large (page borders).

Step 3: Isolated OCR and Data Normalization

Once we have the coordinates (x, y, w, h) of each tile cell, we crop the image of the stone slab from the top half of the cell, crop the text area from the bottom half, and run OCR exclusively on the cropped text area.

By running OCR on a tiny, isolated box rather than the whole page, we guarantee that the extracted SKU, finish (polished/honed), and size parameters belong only to the stone slab image cropped from the same box.

Key Engineering Challenges Solved

1. The Borderless Grid Problem

Some catalogs do not have visible grid lines; they display product images floating on a white page with text underneath. When morphological grid detection returns zero cells, the pipeline switches to a clustering-based layout analyzer. We use projection profiles (scanning rows and columns for white-space gaps) to programmatically compute virtual grid lanes, establishing bounding coordinate zones dynamically.

2. Text-to-Data Normalization

OCR outputs raw string data like "Volacas Wt (Pol) 60x120cm - SKU9087". We run the OCR output through a regex parser and a light local dictionary matching layer. The parser strips punctuation, standardizes measurements (600x1200mm, 60x120 to standard metric floats), and categorizes stone colors and finishes into database-ready enumerations (Material: Marble, Color: White, Finish: Polished).

Conclusion

Parsing highly visual document layouts requires moving beyond raw character recognition. By merging traditional computer vision techniques (contour detection, morphological thresholding) with targeted localized OCR, Tile Extractor transformed chaotic catalogs into clean, standardized commercial APIs. Building systems that bridge the gap between unstructured visual media and structured databases is at the core of what we do at Ananta Labs.

How We Automated Catalog Image Extraction using Computer Vision & FastAPI

somyabhalani — Tue, 19 May 2026 19:24:39 +0000

For businesses in the stone, marble, and interior design industries, managing digital catalog assets is a massive headache.

When a new product catalog arrives as a 100-page PDF, design teams spend hours manually cropping out individual tile samples to upload to their websites or inventory sheets.

To automate this, we built Tile Extractor—a high-performance, automated parsing engine designed specifically to isolate tile samples from raw catalog documents.

How it Works (Under the Hood)

PDF Ingestion: The system uses a FastAPI backend to ingest multi-page PDFs. We process the pages using PyMuPDF to extract raw page vectors and high-res layout structures.
Object Detection & Border Cleaning: Instead of relying on slow, expensive cloud Vision APIs, we use local Pillow and OpenCV-based spatial algorithms. The engine analyzes:
- Edge density to isolate individual tile boundaries.
- Aspect ratios to filter out page noise (like page numbers or logos).
- Color distributions using RGB histograms.
Lossless Cropping: Once a tile is identified and classified, the engine performs a lossless crop directly from the PDF's high-resolution asset stream, ensuring no pixel resolution is lost.
Batch ZIP Packaging: The isolated tile PNGs are packaged into a single ZIP file and returned to the user instantly.

Why it Matters for B2B Automation

What used to take a human designer 2 hours now takes our engine 5 seconds. By running localized computer vision algorithms instead of cloud APIs, we eliminate usage fees and keep client data fully private.

If your business manages product catalogs, you can try the tool for free here:

👉 Try Tile Extractor: https://tile-extractor.onrender.com
👉 Explore our work: https://anantalabs.app/

How We Built a Contactless Digital Signature App inside the Browser (No Servers, 100% Private)

somyabhalani — Tue, 19 May 2026 19:21:46 +0000

Traditional digital signature platforms have two major issues: privacy and cost.

To sign a document, you have to upload sensitive agreements to a third-party server. And as a developer, running server-side document rendering and signatures can lead to heavy API bills and database management overhead.

At Ananta Labs, we wanted to see if we could build a completely secure, contactless alternative that runs entirely on the client side using browser-native AI.

Here is how we built AirSign.

The Architecture: 100% Client-Side

Instead of hosting heavy machine learning models on a GPU server, we compiled our hand-tracking models to run locally inside the user's browser.

Gesture Capture: We utilized MediaPipe's hand-landmarker models compiled into WebAssembly. This allows the browser to track 21 3D hand coordinates in real-time at 30 FPS using a standard webcam.
Contactless Canvas: Using WebGL, we map the index finger coordinate to a HTML5 canvas. We implemented a custom interpolation algorithm to smooth out hand jitter and render a fluid, realistic signature line.
Local PDF Stamp: Once the user finishes drawing their signature in the air, we generate the final document. The signature coordinate vector is parsed and stamped onto the PDF using a client-side library.

Why this is the Future of AI Integration

By moving the computation from the server to the client:

Absolute Privacy: 0 video frames, coordinate points, or document bytes are transmitted to any database.
Zero Server Overhead: The hosting cost for this app is exactly $0 since it runs on the user's CPU/GPU.
Instant Load Times: Zero network latency during signature interpolation.

Try it Yourself

AirSign is completely open and free to test. We’d love to hear your feedback on the hand-tracking latency and mobile performance:

👉 Try AirSign: https://airsign-red.vercel.app/
👉 Explore our work: https://anantalabs.app/