DEV Community: Mikhail Borodin

I almost leaked a customer's data while screen-sharing ChatGPT — here's what I built to stop it

Mikhail Borodin — Thu, 04 Jun 2026 18:38:17 +0000

A few weeks ago I was on a call sharing my screen, walking a teammate through a prompt I'd been iterating on in ChatGPT. Mid-sentence I scrolled up — and there, three messages back, was a chunk of a customer's data I'd pasted in earlier to debug something. Real email, real account info, sitting right there on a shared screen.

Nobody said anything. Maybe nobody noticed. But I noticed, and I spent the rest of the call only half-present, trying to remember everything else still in that thread.

If you live in ChatGPT all day, you already know the problem. The thread is your scratchpad. You paste logs, keys, customer rows, half-finished internal docs — things you'd never put in a doc you planned to share. And then someone says "can you share your screen real quick" and suddenly your scratchpad is a presentation.

Why the usual advice doesn't work

The standard answers are all some version of "be careful":

Open a clean tab before sharing.
Scroll to the top.
Use a separate "demo" account.

These fail for the same reason all manual checklists fail under pressure: the moment you actually need them is the moment you're distracted, talking, and not thinking about hygiene. You remember after. The fix has to happen before the screen goes live, and it has to require zero discipline in the moment.

What I wanted instead

I wanted something that just sat there and blurred sensitive parts of a page automatically, so that even if I forgot, the leak couldn't happen. A few requirements:

Local only. Whatever it does, it never sends page content anywhere. A privacy tool that phones home is a contradiction.
Before, not after. It blurs while the page renders, not after I've already exposed it.
Per-element, not whole-screen. A full black box is useless for a demo. I still need to show the working parts.

The interesting technical bit

The naive approach is to listen for some "I'm sharing now" signal and react. That's too late — there's a visible frame where the data is exposed before the blur kicks in. You're racing the screen capture.

The approach that actually works is to apply the blur as a CSS layer that's already present on matched elements, and only reveal on explicit interaction (hover-to-peek, or a toggle). Roughly:

.privacy-blur {
  filter: blur(8px);
  transition: filter 0.12s ease;
  user-select: none;
}

.privacy-blur:hover {
  filter: blur(0);
}

The hard part isn't the blur — it's deciding what to blur on an arbitrary page you don't control. You can't hardcode selectors for every site. So you end up with a mix of:

Heuristic matchers (things that look like emails, keys, long tokens, currency near billing-ish labels).
A MutationObserver to catch content injected after load — critical for chat apps like ChatGPT, where messages stream in dynamically and a one-time pass on load misses everything.
User-defined rules per domain for the stuff heuristics can't catch.

The MutationObserver part was the one I underestimated. In a normal page you blur once on load and you're done. In a streaming chat UI, content arrives continuously, so the observer has to re-run matching on every batch of new nodes — while staying cheap enough not to lag the page. Debouncing the observer callback and only scanning added nodes (not re-scanning the whole DOM) was what made it usable.

I packaged it up

I ended up turning it into a small Chrome extension so I'd stop relying on my own memory. It's local-only, blurs matched content before it's visible, and lets you peek per-element. If the same problem bites you, it's Screen Privacy Blur on the Chrome Web Store.

But honestly, the extension is secondary to the point I actually want to land:

The takeaway

Treat your ChatGPT thread like a shared screen by default, not like a private notebook. The moment you paste anything real into it, assume it could end up in front of someone. Build the habit — or build the guardrail — before the "can you share your screen" moment, because that moment never comes with enough warning.

How do you handle this? Separate accounts, a scrub-before-share ritual, something else? Genuinely curious what's worked for people who demo in ChatGPT a lot.

Running locally DeepSeek-R1 for RAG

Mikhail Borodin — Fri, 21 Feb 2025 09:45:49 +0000

Introduction

For the last 2 weeks the internet has been abuzz with the emergence of the new DeepSeek generative model. The biggest surprise was that with similar quality of ChatGPT answers it cost an order of magnitude less to train, this has already affected Nvidia's stock price, which lost about 20% of its value in one go. This model can be used for free either through the web or through a mobile app. But today I would like to highlight another of the possibilities of using this model, or rather the possibility of running it locally. And let's look at it on a small project. Let's imagine that we have documentation and we need to search for information or analyze it. For this purpose we can apply the RAG technology.

What is RAG?

Retrieval-Augmented Generation (RAG) is an advanced AI technique designed to improve the accuracy and reliability of language models by integrating the search for external information into the response generation process. Unlike traditional generative models that rely solely on pre-trained knowledge, RAG dynamically searches for relevant information before generating a response, reducing hallucinations and improving fact accuracy.

RAG's workflow consists of three key steps. First, it retrieves relevant documents or data from a knowledge base, which may include structured databases, vector stores, or even real-time APIs. The retrieved information is then combined with the model's internal knowledge to ensure that the answers are based on relevant and reliable sources. Finally, the model generates a reasoned answer using both its trained language capabilities and the newly acquired data.

This approach offers significant advantages. By basing answers on external sources, RAG minimizes inaccuracies and ensures that information is up-to-date, making it particularly useful in fast-changing fields such as finance, healthcare, and law. It is widely used in chatbots, AI systems with advanced search, enterprise knowledge discovery, and AI-driven research assistants where accuracy and factual validity are critical.

What will the architecture of the solution look like?

A knowledge base is a collection of relevant and up-to-date information that serves as the basis for a RAG. In our case, these are documents stored in the catalog.

Before you start implementing this architecture, the following libraries must be installed (tested on Python 3.11):

llama-index   
transformers   
torch   
sentence-transformers   
llama-index-llms-ollama

Here's how you can upload your documents to LlamaIndex as objects:

from llama_index.core import SimpleDirectoryReader 

loader = SimpleDirectoryReader(   
    input_dir=input_dir_path, required_exts=[“.pdf”],   
    recursive=True   
)   
docs = loader.load_data()

The next step is to build a vector store index, which are a key component of search-enhanced generation (RAG), and so you will use them in almost every application that uses LlamaIndex, either directly or indirectly.

Vector stores take a list of Node objects and build an index from them

VectorStoreIndex in RAG

In Retrieval-Augmented Generation (RAG), the VectorStoreIndex index is used to store and retrieve vector embeddings of documents. This allows the system to find relevant information based on semantic similarity rather than exact keyword matches.

The process starts with document embedding. Textual data is converted into vector embeddings using an embedding model. These embeddings represent the meaning of the text in numerical form, making it easier to compare and find similar content. Once created, these embeddings are stored in a vector database such as FAISS, Pinecone, Weaviate, Qdrant or Chroma.

When a user submits a query, it is also converted into an embedding using the same model. The system then searches for similar embeddings by comparing the query with the stored vectors using similarity metrics such as cosine similarity or Euclidean distance. The most relevant documents are retrieved based on their similarity scores.

The retrieved documents are passed as context to a large language model (LLM), which uses them to generate a response. This process ensures that the LLM has access to relevant information, improving accuracy and reducing hallucinations.

The use of VectorStoreIndex improves RAG systems by providing semantic search, improving scalability and supporting real-time search. This ensures that answers are contextually accurate and based on the most relevant information available.

Building a vector index is easy

from llama_index.core import VectorStoreIndex 

index = VectorStoreIndex.from_documents(docs)   

### Save the index to a file   
index.storage_context.persist(persist_dir=“./index”)

The next step is to download and run the deepseek-R1 model using ollama. To do this, you need to install ollama https://ollama.com/download and download the deepseek-r1 model.
https://ollama.com/library/deepseek-r1

After that it is enough to execute the following code to make RAG start working

from llama_index.core import StorageContext, load_index_from_storage  
from llama_index.core import Settings, PromptTemplate  
from llama_index.llms.ollama import Ollama

# rebuild storage context  
storage_context = StorageContext.from_defaults(persist_dir=“./index”)  

# load index  
index = load_index_from_storage(storage_context)  
# Creating a prompt template  
qa_prompt_tmpl_str = (  
    “Context information is below. \n”  
    “__\n”  
    “{context_str} \n”  
    “__in” ”Given the context information above I want you \n”  
    “to think step by step to answer the query in a crisp \n”  
    “'cause you don't know the answer say \n'”  
    “'I don't know!' \n”  
    “Query: {query_str}{n}”  
    “Answer: “)  
qa_prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str)  

# Setting up a query engine  
llm = Ollama(model=“deepseek-r1:1.5b”, request_timeout=360.0)  

# Setup a query engine on the index previously created  
Settings.llm = llm # specifying the llm to be used  
query_engine = index.as_query_engine(  
    similarity_top_k=10  
)  
query_engine.update_prompts({“response_synthesizer:text_qa_template”: qa_prompt_tmpl})  
response = query_engine.query('What is the ACID?')  
print(response)

Final step. Interface

We can embed this solution into a desktop client by creating a user interface using Streamlit to allow the user to interact with our RAG application via chat. Or you can make a Telegram bot, then it will be even easier to create an interface