<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: David Rodriguez</title>
    <description>The latest articles on DEV Community by David Rodriguez (@davidrodriguez).</description>
    <link>https://dev.to/davidrodriguez</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3791690%2F6cc17fc7-cd4e-4da2-bff9-b33062be20f3.jpg</url>
      <title>DEV Community: David Rodriguez</title>
      <link>https://dev.to/davidrodriguez</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/davidrodriguez"/>
    <language>en</language>
    <item>
      <title>How to Analyze Sensitive Data Without Uploading It Anywhere</title>
      <dc:creator>David Rodriguez</dc:creator>
      <pubDate>Wed, 25 Feb 2026 12:12:29 +0000</pubDate>
      <link>https://dev.to/davidrodriguez/how-to-analyze-sensitive-data-without-uploading-it-anywhere-18l2</link>
      <guid>https://dev.to/davidrodriguez/how-to-analyze-sensitive-data-without-uploading-it-anywhere-18l2</guid>
      <description>&lt;h2&gt;
  
  
  The problem nobody talks about
&lt;/h2&gt;

&lt;p&gt;You export a CSV from your CRM. It has customer emails, revenue numbers, maybe even payment info. You need to answer a quick question: &lt;em&gt;"Which accounts churned last quarter?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So you do what everyone does. You upload it to Google Sheets. Or spin up a Jupyter notebook. Or paste it into some AI chatbot.&lt;/p&gt;

&lt;p&gt;And just like that, your sensitive data is sitting on someone else's server.&lt;/p&gt;

&lt;p&gt;For most analysts, this is the default workflow — not because they don't care about privacy, but because the alternatives are painful. Run a local Postgres instance? Write a Python script for a one-off question? That's a 30-minute detour for a 30-second answer.&lt;/p&gt;

&lt;p&gt;There's a better way.&lt;/p&gt;

&lt;h2&gt;
  
  
  DuckDB in the browser changes everything
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://duckdb.org" rel="noopener noreferrer"&gt;DuckDB&lt;/a&gt; is an embeddable SQL database built for analytics. It's fast, handles CSVs natively, and — crucially — it compiles to WebAssembly, which means it runs &lt;strong&gt;entirely inside your browser tab&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;No server. No upload. No Docker container. Just drag a file in and run SQL against it.&lt;/p&gt;

&lt;p&gt;This matters because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Your data never touches a network.&lt;/strong&gt; The file goes from your disk into browser memory. That's it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It handles real-world files.&lt;/strong&gt; Multi-hundred-MB CSVs, messy Excel sheets, Parquet from data warehouses — all supported.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQL is SQL.&lt;/strong&gt; If you know &lt;code&gt;SELECT&lt;/code&gt;, you're already productive. No new API to learn.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The limitation has always been the tooling around it. Running raw DuckDB-WASM means writing JavaScript boilerplate, manually parsing schemas, and staring at JSON output in the console.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a local-first data analysis workflow looks like
&lt;/h2&gt;

&lt;p&gt;Here's the workflow I actually use when someone hands me a sensitive dataset:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Load the file
&lt;/h3&gt;

&lt;p&gt;Drag a CSV (or Excel, JSON, Parquet) into the browser. DuckDB-WASM parses it, infers types, and creates a table — usually in under a second for files up to a few hundred MB.&lt;/p&gt;

&lt;p&gt;No upload step. No progress bar hitting a remote API. The file loads from disk into your browser's memory via the File API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Get oriented
&lt;/h3&gt;

&lt;p&gt;Before writing a single query, you want to know what you're working with. A good local tool should auto-profile the data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Row count and column types&lt;/li&gt;
&lt;li&gt;Null rates and cardinality per column&lt;/li&gt;
&lt;li&gt;Distribution summaries (min/max/median for numbers, top values for strings)&lt;/li&gt;
&lt;li&gt;Data quality flags (mixed types, suspicious outliers, encoding issues)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the step most people skip when they jump straight into SQL or pandas. It's also where you catch problems early — like a "revenue" column that's actually stored as text, or a date field with three different formats.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Ask questions
&lt;/h3&gt;

&lt;p&gt;Now you're ready to explore. You can write SQL directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;account_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mrr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;churn_date&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;churn_date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2025-10-01'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;mrr&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or, if the tool supports it, ask in plain English and let AI translate to SQL. The key privacy distinction: &lt;strong&gt;the AI only needs your schema (column names and types) to generate queries, not your actual data rows.&lt;/strong&gt; The query runs locally in DuckDB. The AI never sees a single row of your dataset.&lt;/p&gt;

&lt;p&gt;This is fundamentally different from tools like ChatGPT Advanced Data Analysis, where you upload the file to OpenAI's servers and the code runs on their infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Drill down
&lt;/h3&gt;

&lt;p&gt;One query rarely answers the real question. "Which accounts churned?" leads to "Was it concentrated in one segment?" which leads to "Did we see this pattern before?"&lt;/p&gt;

&lt;p&gt;Good analysis is iterative. A local-first tool should support this by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Letting you click on results to drill deeper&lt;/li&gt;
&lt;li&gt;Suggesting follow-up questions automatically&lt;/li&gt;
&lt;li&gt;Tracking your exploration path so you can retrace steps&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 5: Share the insight (not the data)
&lt;/h3&gt;

&lt;p&gt;Once you've found the answer, you share a chart, a summary, or a report — not the underlying dataset. The raw data stays on your machine. The insight travels.&lt;/p&gt;

&lt;h2&gt;
  
  
  When you need AI without sacrificing privacy
&lt;/h2&gt;

&lt;p&gt;The "no AI" version of this workflow works, but it's slow. Writing SQL for every exploratory question is fine if you're a data engineer. If you're an ops lead or a product manager, it's a blocker.&lt;/p&gt;

&lt;p&gt;The trick is separating &lt;strong&gt;what the AI sees&lt;/strong&gt; from &lt;strong&gt;what it processes&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Where it runs&lt;/th&gt;
&lt;th&gt;What it sees&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data storage &amp;amp; queries&lt;/td&gt;
&lt;td&gt;Your browser (DuckDB-WASM)&lt;/td&gt;
&lt;td&gt;All your data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI query generation&lt;/td&gt;
&lt;td&gt;Cloud API&lt;/td&gt;
&lt;td&gt;Schema + your question only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Results &amp;amp; charts&lt;/td&gt;
&lt;td&gt;Your browser&lt;/td&gt;
&lt;td&gt;Query output&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This architecture means you can use GPT, Claude, Gemini, or any other model to help you analyze data — without the model ever accessing your actual data. The AI writes the SQL. DuckDB runs it locally. You see the results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tools that support this workflow
&lt;/h2&gt;

&lt;p&gt;A few tools take this local-first approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DuckDB CLI/Python&lt;/strong&gt; — Maximum control, but requires a local dev environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evidence&lt;/strong&gt; — Great for building BI dashboards from local DuckDB, but focused on SQL-literate users.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QueryVeil&lt;/strong&gt; — Full browser-based analyst (disclosure: we built this). Drag in files, ask questions in English or SQL, get auto-profiling, and the AI agent runs multi-step investigations. DuckDB-WASM under the hood, free tier with local AI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rill&lt;/strong&gt; — Excellent for fast dashboarding on Parquet/DuckDB, more BI-oriented.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The right choice depends on whether you want a CLI, a BI tool, or an AI-assisted analyst.&lt;/p&gt;

&lt;h2&gt;
  
  
  Checklist: Is your data tool actually private?
&lt;/h2&gt;

&lt;p&gt;Before trusting any tool with sensitive data, check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;strong&gt;Does the file leave your machine?&lt;/strong&gt; If there's an upload step, your data is on their server.&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Where does the query engine run?&lt;/strong&gt; "Cloud-hosted DuckDB" still means your data is on their infrastructure.&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;What does the AI see?&lt;/strong&gt; Schema-only is good. Full data context is a red flag.&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Can you use it offline?&lt;/strong&gt; True local-first tools work without an internet connection (at least for the non-AI parts).&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Is it WebAssembly or a server process?&lt;/strong&gt; WASM runs in your browser sandbox. A server process — even on localhost — is a different trust model.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The bottom line
&lt;/h2&gt;

&lt;p&gt;You shouldn't have to choose between powerful analysis and data privacy. DuckDB-WASM made it technically possible to run real SQL analytics in the browser. The tooling is finally catching up to make it practical.&lt;/p&gt;

&lt;p&gt;Next time someone hands you a CSV with sensitive data, don't upload it. Analyze it locally.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>data</category>
      <category>analytics</category>
      <category>privacy</category>
    </item>
  </channel>
</rss>
