<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Godwill Christopher</title>
    <description>The latest articles on DEV Community by Godwill Christopher (@keenchris).</description>
    <link>https://dev.to/keenchris</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3890557%2F2c42c080-0ed8-4931-bfc5-db99c28bd57e.jpeg</url>
      <title>DEV Community: Godwill Christopher</title>
      <link>https://dev.to/keenchris</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/keenchris"/>
    <language>en</language>
    <item>
      <title>What's new in Data Preprocessor 1.5.x — R codegen, Robust Scaler, and a deadlock post-mortem</title>
      <dc:creator>Godwill Christopher</dc:creator>
      <pubDate>Mon, 25 May 2026 14:42:14 +0000</pubDate>
      <link>https://dev.to/keenchris/whats-new-in-data-preprocessor-15x-r-codegen-robust-scaler-and-a-deadlock-post-mortem-kb9</link>
      <guid>https://dev.to/keenchris/whats-new-in-data-preprocessor-15x-r-codegen-robust-scaler-and-a-deadlock-post-mortem-kb9</guid>
      <description>&lt;p&gt;It's been a few months since I last wrote about Data Preprocessor, the IntelliJ plugin I built to stop re-writing the same pandas preprocessing scripts every project. The 1.5.x series has landed a real R codegen path, a more honest outlier-resistant normalizer, and one genuinely embarrassing deadlock that I want to talk about openly because the lesson is useful.&lt;br&gt;
tl;dr on what the plugin does&lt;br&gt;
You load a CSV, Excel, or JSON file inside your JetBrains IDE. The plugin profiles every column (type, null count, mean/median/std, mode, unique count). You build a pipeline visually — drop nulls, fill with mean, deduplicate, remove IQR outliers, normalize (min-max / z-score / robust), label-encode, one-hot, train/test split, sort, filter, type-cast — and then one click emits a complete, ready-to-run Python (pandas) or R (base + a few small libs) script.&lt;br&gt;
All processing is local. The plugin collects no telemetry. The generated code is normal pandas or normal R — no runtime library, no plugin import, nothing magic. Read it, edit it, commit it alongside your dataset, run it long after you've uninstalled the plugin.&lt;br&gt;
Here's roughly what a 5-step pipeline turns into:&lt;br&gt;
python# Generated by Data Preprocessor 1.5.6&lt;/p&gt;

&lt;h1&gt;
  
  
  Source: sample-data/employees.csv
&lt;/h1&gt;

&lt;p&gt;import pandas as pd&lt;br&gt;
from sklearn.model_selection import train_test_split&lt;/p&gt;

&lt;p&gt;df = pd.read_csv("sample-data/employees.csv")&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 1: drop rows where 'department' is null
&lt;/h1&gt;

&lt;p&gt;df = df.dropna(subset=["department"])&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 2: fill 'performance_score' null with median
&lt;/h1&gt;

&lt;p&gt;df["performance_score"] = df["performance_score"].fillna(&lt;br&gt;
    df["performance_score"].median()&lt;br&gt;
)&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 3: remove duplicates
&lt;/h1&gt;

&lt;p&gt;df = df.drop_duplicates()&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 4: Robust Scaler on 'salary' (median/IQR, IQR=0 guard)
&lt;/h1&gt;

&lt;p&gt;_med = df["salary"].median()&lt;br&gt;
_q1  = df["salary"].quantile(0.25)&lt;br&gt;
_q3  = df["salary"].quantile(0.75)&lt;br&gt;
_iqr = _q3 - _q1&lt;br&gt;
if _iqr != 0:&lt;br&gt;
    df["salary"] = (df["salary"] - _med) / _iqr&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 5: train/test split (ratio 0.8)
&lt;/h1&gt;

&lt;p&gt;train, test = train_test_split(df, train_size=0.8, random_state=42)&lt;br&gt;
The R output is structurally the same, with readxl / jsonlite / fastDummies imported only when the pipeline actually uses them.&lt;br&gt;
1.5.0 — R code generation, for real&lt;br&gt;
The biggest change since I last posted is that the codegen is no longer Python-only. The full 16-operation pipeline now has an R equivalent. Label-encode is 0-based to match pandas.factorize (R's native factor() is 1-based by default — that was a fun footgun to find and fix in 1.5.5).&lt;br&gt;
This was a deliberate choice rather than a feature request: when you preprocess for an analytics team, half of them are in Python and half are in R, and forcing the cleanup to be language-specific defeats the point of having a reproducible artifact. The visual pipeline is the spec; Python and R are just two render targets.&lt;br&gt;
1.3.0 → 1.5.5 — Robust Scaler with honest edge cases&lt;br&gt;
Min-max and z-score break in interesting ways when your column has outliers. A single row at 10⁹ collapses the rest of the column into a narrow band near zero. So 1.3.0 added the Robust Scaler — (x - median) / IQR — which gives you a normalization that doesn't get yanked around by the long tail.&lt;br&gt;
The catch: when IQR = 0 (column is constant, or near-constant), the naïve formula divides by zero and silently produces NaN in Python or Inf in R. The Java preview already guarded against this (returned the column unchanged), but the generated scripts didn't. 1.5.5 added explicit if _iqr != 0: / if (.iqr != 0) guards in both generated outputs to match the preview's behaviour exactly.&lt;br&gt;
Boring fix, but the kind of thing where the absence of an error is worse than a noisy crash. A NaN that propagates through three more steps is much harder to debug than a ZeroDivisionError at the source.&lt;br&gt;
1.5.3 — the deadlock post-mortem&lt;br&gt;
This is the one I want to talk about. The IntelliJ Platform 2024.2 changed how FileChooser.chooseFiles interacts with the EDT (event-dispatching thread). The Browse button started failing intermittently on newer IDEs, so 1.5.3 wrapped the call in ApplicationManager.invokeLater(...).&lt;br&gt;
That was wrong, and not in a "minor regression" way — in a "the entire IDE freezes for every user who installs the plugin" way.&lt;br&gt;
Here's the trap: FileChooser.chooseFiles is already asynchronous on its own. Wrapping it in invokeLater queues a runnable behind the EDT pump, but the runnable itself opens a modal-style dispatcher that blocks the EDT pump waiting for itself to dispatch. Neither side makes progress. Cursor hangs, dock icon stops responding, and the JVM has to be killed from Activity Monitor.&lt;br&gt;
I caught it within about an hour because users on Marketplace were immediate and direct about it (sincere gratitude for that — angry early users are the most valuable kind), retracted 1.5.3, shipped 1.5.4 as a straight revert, and added a permanent comment to the source so I don't repeat the mistake:&lt;br&gt;
java// FileChooser.chooseFiles is already asynchronous and must be called&lt;br&gt;
// directly from the EDT — no wrapper is needed or safe.&lt;br&gt;
1.5.5 then fixed the original Browse problem the right way: switched to the built-in single-file chooser, kept directories visible in the filter so users can navigate normally, and anchored the dialog to the tool window component rather than letting it float free.&lt;br&gt;
Two lessons I'm carrying forward:&lt;/p&gt;

&lt;p&gt;When the platform changes async semantics, read the source — don't guess. The 2024.2 release notes mentioned the dispatcher change, but I didn't connect it back to FileChooser because the API surface hadn't moved.&lt;br&gt;
Modal-dialog-on-EDT bugs don't show up in CI. They show up the moment a real user clicks the button. Manual smoke-testing on a sandbox IDE before every publish is now non-negotiable for me.&lt;/p&gt;

&lt;p&gt;1.5.6 — SDK alignment&lt;br&gt;
Just shipped today. pluginSinceBuild bumped from 233 to 243, matching the 2024.3 SDK I actually compile against. JetBrains' Plugin Verifier reports Compatible against IC-243, IC-251, IC-252, and IU-253 — zero deprecated-API usages against 2024.3 itself, three soft deprecations in 2025.x that I'll address in the next minor.&lt;br&gt;
I also disabled the Gradle IntelliJ Plugin's GitHub self-update check, which had a habit of failing the entire build whenever GitHub's API was rate-limited or my network was offline. That one ate two hours of my Monday before I tracked down the fix:&lt;br&gt;
properties# gradle.properties&lt;br&gt;
systemProp.org.jetbrains.intellij.buildFeature.selfUpdateCheck=false&lt;br&gt;
If you build any IntelliJ plugin and you've ever stared at Cannot resolve the latest Gradle IntelliJ Plugin version and wondered why a build with no actual problems is failing — that line is the fix.&lt;br&gt;
What's next&lt;br&gt;
The most-requested features right now, in order:&lt;/p&gt;

&lt;p&gt;Categorical binning — equal-width and quantile-based bucketization for numeric columns into categorical bins. Pandas has pd.cut and pd.qcut; R has a few options. Codegen for both is straightforward; the UI work is figuring out how to preview the bins without making the tool window huge.&lt;br&gt;
Pipeline import/export as JSON so teams can share pipeline definitions in the repo and re-apply them via CLI in CI. This is the change that turns the plugin from a "speed up the first cleanup" tool into a "version-control your data cleanups" tool.&lt;br&gt;
DuckDB read path for files too large to fit in memory. The current LoaderArchitecture is single-pass row-oriented; DuckDB would let the plugin profile and clean files up to ~10 GB on a laptop without rewriting the engine.&lt;/p&gt;

&lt;p&gt;If you've used the plugin and have opinions on which of these to prioritize — or a totally different request — please drop it as an issue or just reply here. The most useful feedback is "I tried to do X and the generated code does Y instead" because those are the highest-leverage fixes.&lt;br&gt;
Try it&lt;/p&gt;

&lt;p&gt;Marketplace: &lt;a href="https://plugins.jetbrains.com/plugin/31226-data-preprocessor" rel="noopener noreferrer"&gt;https://plugins.jetbrains.com/plugin/31226-data-preprocessor&lt;/a&gt;&lt;br&gt;
Source (MIT): &lt;a href="https://github.com/codaBlurd/data-preprocessor-plugin" rel="noopener noreferrer"&gt;https://github.com/codaBlurd/data-preprocessor-plugin&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bug reports, feature requests, and PRs all welcome. Reviews on the Marketplace are how the plugin gets discovered by new users — if it's saved you time, two minutes there is the highest-leverage thing you can do for it.&lt;br&gt;
Thanks for reading. Build something good this week.&lt;/p&gt;

</description>
      <category>intellij</category>
      <category>datascience</category>
      <category>python</category>
      <category>jetbrains</category>
    </item>
    <item>
      <title>How I Used the Observer Pattern to Watch Directories in Java (And Prevent Race Conditions)</title>
      <dc:creator>Godwill Christopher</dc:creator>
      <pubDate>Mon, 27 Apr 2026 14:22:46 +0000</pubDate>
      <link>https://dev.to/keenchris/how-i-used-the-observer-pattern-to-watch-directories-in-java-and-prevent-race-conditions-4j85</link>
      <guid>https://dev.to/keenchris/how-i-used-the-observer-pattern-to-watch-directories-in-java-and-prevent-race-conditions-4j85</guid>
      <description>&lt;p&gt;If you've ever needed to react to file system changes in Java — a config file updating, an upload folder receiving a new file, a hot-reload mechanism — you've probably reached for Java's &lt;code&gt;WatchService&lt;/code&gt;. It's clean. It's built-in. And it hides a subtle concurrency trap that will burn you in production if you're not paying attention.&lt;/p&gt;

&lt;p&gt;In this post I'll walk through building a directory watcher using the Observer pattern, and then show exactly where race conditions creep in and how to shut them down with a &lt;code&gt;ReentrantLock&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Observer Pattern in 10 Seconds
&lt;/h2&gt;

&lt;p&gt;Observer is a behavioural pattern where one object (the subject) maintains a list of dependents (observers) and notifies them automatically when its state changes.&lt;/p&gt;

&lt;p&gt;In our case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subject&lt;/strong&gt; — the directory watcher, watching for file events&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observers&lt;/strong&gt; — any number of handlers that react when a file changes
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;FileChangeObserver&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onFileChanged&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt; &lt;span class="n"&gt;filePath&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DirectoryWatcher&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;FileChangeObserver&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;observers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ArrayList&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;();&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;addObserver&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;FileChangeObserver&lt;/span&gt; &lt;span class="n"&gt;observer&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;observers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;observer&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;notifyObservers&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;FileChangeObserver&lt;/span&gt; &lt;span class="n"&gt;observer&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;observers&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;observer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;onFileChanged&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clean and extensible — add as many handlers as you need without touching the watcher itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setting Up WatchService
&lt;/h2&gt;

&lt;p&gt;Java NIO gives us &lt;code&gt;WatchService&lt;/code&gt; — a low-level file system event API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;watch&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt; &lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;IOException&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;InterruptedException&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;WatchService&lt;/span&gt; &lt;span class="n"&gt;watchService&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FileSystems&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getDefault&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;newWatchService&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;register&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;watchService&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;StandardWatchEventKinds&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ENTRY_CREATE&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;StandardWatchEventKinds&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ENTRY_MODIFY&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;StandardWatchEventKinds&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ENTRY_DELETE&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;WatchKey&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;watchService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;take&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// blocks until an event arrives&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;WatchEvent&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;?&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;pollEvents&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;Path&lt;/span&gt; &lt;span class="n"&gt;changed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;resolve&lt;/span&gt;&lt;span class="o"&gt;((&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
            &lt;span class="n"&gt;notifyObservers&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;changed&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;reset&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works perfectly — until you run it in a multi-threaded environment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where the Race Condition Hides
&lt;/h2&gt;

&lt;p&gt;Say you spin up a thread pool to process file change events faster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;ExecutorService&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Executors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newFixedThreadPool&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;WatchEvent&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;?&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;pollEvents&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Path&lt;/span&gt; &lt;span class="n"&gt;changed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;resolve&lt;/span&gt;&lt;span class="o"&gt;((&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;submit&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;notifyObservers&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;changed&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now four threads can be notifying observers simultaneously. If two events arrive for the same file at nearly the same time — say a file is written and then immediately modified — two threads can call &lt;code&gt;onFileChanged&lt;/code&gt; on the same path concurrently.&lt;/p&gt;

&lt;p&gt;Depending on what your observer does (write to a database, process the file, update a cache), you now have a race condition. Two threads reading and transforming the same file simultaneously. Silent data corruption. The worst kind of bug.&lt;/p&gt;




&lt;h2&gt;
  
  
  Fixing It With ReentrantLock
&lt;/h2&gt;

&lt;p&gt;A &lt;code&gt;ReentrantLock&lt;/code&gt; lets only one thread process a given file path at a time while other threads wait their turn.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DirectoryWatcher&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;FileChangeObserver&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;observers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ArrayList&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ReentrantLock&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;fileLocks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ConcurrentHashMap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;();&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;ReentrantLock&lt;/span&gt; &lt;span class="nf"&gt;getLockForPath&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fileLocks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;computeIfAbsent&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ReentrantLock&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;notifyObservers&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;ReentrantLock&lt;/span&gt; &lt;span class="n"&gt;lock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getLockForPath&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;FileChangeObserver&lt;/span&gt; &lt;span class="n"&gt;observer&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;observers&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;observer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;onFileChanged&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
            &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;unlock&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// always release in finally — never skip this&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ConcurrentHashMap&lt;/code&gt; gives you one lock &lt;strong&gt;per file path&lt;/strong&gt; — threads processing different files don't block each other, only threads processing the &lt;strong&gt;same&lt;/strong&gt; file do&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;computeIfAbsent&lt;/code&gt; is atomic — no two threads will create two locks for the same path&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;finally&lt;/code&gt; block guarantees the lock is released even if an observer throws an exception&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why Not Just Use synchronized?
&lt;/h2&gt;

&lt;p&gt;You could use a &lt;code&gt;synchronized&lt;/code&gt; block on the path object, but &lt;code&gt;ReentrantLock&lt;/code&gt; gives you more control — you can use &lt;code&gt;tryLock()&lt;/code&gt; to skip processing if the lock is already held (useful if you want to drop duplicate events rather than queue them) and it's more explicit about what you're protecting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Skip instead of queue — useful for high-frequency file events&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tryLock&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;notifyObservers&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;unlock&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Skipping duplicate event for: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Full Picture
&lt;/h2&gt;

&lt;p&gt;File system event&lt;br&gt;
↓&lt;br&gt;
WatchService&lt;br&gt;
↓&lt;br&gt;
Thread pool (4 threads)&lt;br&gt;
↓&lt;br&gt;
notifyObservers(path)&lt;br&gt;
↓&lt;br&gt;
ReentrantLock (per path)&lt;br&gt;
↓&lt;br&gt;
Observers notified safely&lt;/p&gt;

&lt;p&gt;The Observer pattern keeps your handlers decoupled and easy to extend. The per-path locking ensures concurrent events on the same file are serialised without bottlenecking events on different files.&lt;/p&gt;

&lt;p&gt;This pattern came up directly in production work — building file ingestion pipelines where multiple events on the same file within milliseconds of each other would otherwise cause partial reads and corrupt downstream processing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I write about backend Java engineering, Spring Boot, and systems design. Follow for more.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>java</category>
      <category>multithreading</category>
      <category>designpatterns</category>
      <category>backend</category>
    </item>
    <item>
      <title>What I learned after 80+ installs of my first JetBrains plugin</title>
      <dc:creator>Godwill Christopher</dc:creator>
      <pubDate>Mon, 27 Apr 2026 12:31:15 +0000</pubDate>
      <link>https://dev.to/keenchris/what-i-learned-after-80-installs-of-my-first-jetbrains-plugin-3j5</link>
      <guid>https://dev.to/keenchris/what-i-learned-after-80-installs-of-my-first-jetbrains-plugin-3j5</guid>
      <description>&lt;p&gt;After publishing my first JetBrains plugin, I expected the main challenge to be getting users.&lt;/p&gt;

&lt;p&gt;It wasn’t.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Problem Wasn’t Features
&lt;/h2&gt;

&lt;p&gt;Within the first few days, the plugin got ~80 installs.&lt;/p&gt;

&lt;p&gt;But something felt off.&lt;/p&gt;

&lt;p&gt;No feedback.&lt;br&gt;
No reviews.&lt;br&gt;
No real engagement.&lt;/p&gt;

&lt;p&gt;Turns out, the issue wasn’t missing features.&lt;/p&gt;

&lt;p&gt;It was something much simpler:&lt;/p&gt;

&lt;p&gt;👉 The tool window wasn’t showing after installation.&lt;/p&gt;

&lt;p&gt;Which meant users installed the plugin…&lt;br&gt;
…and couldn’t actually use it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Fixing One Small Issue Changed Everything
&lt;/h2&gt;

&lt;p&gt;After fixing that (v1.0.3), things immediately improved.&lt;/p&gt;

&lt;p&gt;People could finally:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;open the tool&lt;/li&gt;
&lt;li&gt;load datasets&lt;/li&gt;
&lt;li&gt;actually try the features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It was a reminder that:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The first-use experience matters more than feature depth.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What I Built (Quick Context)
&lt;/h2&gt;

&lt;p&gt;The plugin is a small tool for preprocessing data directly inside JetBrains IDEs.&lt;/p&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;jumping to Excel or Jupyter&lt;/li&gt;
&lt;li&gt;writing quick pandas scripts&lt;/li&gt;
&lt;li&gt;switching back to the IDE&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;load CSV, Excel, or JSON files&lt;/li&gt;
&lt;li&gt;inspect and clean data visually&lt;/li&gt;
&lt;li&gt;generate pandas code from your steps&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Early Feedback
&lt;/h2&gt;

&lt;p&gt;Once people could actually use it, I started getting real feedback.&lt;/p&gt;

&lt;p&gt;One suggestion stood out:&lt;/p&gt;

&lt;p&gt;👉 “Add Excel-style sorting and filtering to the preview”&lt;/p&gt;

&lt;p&gt;It makes sense.&lt;/p&gt;

&lt;p&gt;A lot of the workflow is just:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;exploring the dataset&lt;/li&gt;
&lt;li&gt;spotting issues&lt;/li&gt;
&lt;li&gt;understanding structure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Improving that part would make the tool much more useful.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Changed Next
&lt;/h2&gt;

&lt;p&gt;Based on feedback, I’ve already:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fixed the onboarding issue (tool window visibility)&lt;/li&gt;
&lt;li&gt;added Excel (.xlsx) and JSON support (v1.1.0)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And next up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;improving the data preview experience&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;A few takeaways from this:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Installs don’t mean usage
&lt;/h3&gt;

&lt;p&gt;People can install your tool and never actually use it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Small UX issues can block everything
&lt;/h3&gt;

&lt;p&gt;A missing UI element completely stopped users from getting value.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Feedback is everything early on
&lt;/h3&gt;

&lt;p&gt;One good user can give you direction for your next version.&lt;/p&gt;




&lt;h2&gt;
  
  
  Still Early
&lt;/h2&gt;

&lt;p&gt;This is still an early version, and I’m actively improving it.&lt;/p&gt;

&lt;p&gt;If you work with data preprocessing, ETL pipelines, or just deal with messy datasets regularly, I’d really appreciate your thoughts.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://plugins.jetbrains.com/plugin/31226-data-preprocessor/" rel="noopener noreferrer"&gt;https://plugins.jetbrains.com/plugin/31226-data-preprocessor/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even small feedback helps at this stage.&lt;/p&gt;

</description>
      <category>devjournal</category>
      <category>sideprojects</category>
      <category>tooling</category>
      <category>ux</category>
    </item>
    <item>
      <title>I got tired of rewriting the same pandas preprocessing code — so I built a plugin</title>
      <dc:creator>Godwill Christopher</dc:creator>
      <pubDate>Tue, 21 Apr 2026 21:56:26 +0000</pubDate>
      <link>https://dev.to/keenchris/i-got-tired-of-rewriting-the-same-pandas-preprocessing-code-so-i-built-a-plugin-l48</link>
      <guid>https://dev.to/keenchris/i-got-tired-of-rewriting-the-same-pandas-preprocessing-code-so-i-built-a-plugin-l48</guid>
      <description>&lt;p&gt;If you work with CSV data, you’ve probably written this code more times than you’d like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dropna()&lt;/li&gt;
&lt;li&gt;fillna()&lt;/li&gt;
&lt;li&gt;removing duplicates&lt;/li&gt;
&lt;li&gt;basic outlier filtering&lt;/li&gt;
&lt;li&gt;normalizing columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of it is particularly difficult.&lt;/p&gt;

&lt;p&gt;But it’s repetitive.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;As a backend engineer working with data pipelines, I kept running into the same pattern.&lt;/p&gt;

&lt;p&gt;Before doing anything meaningful with a dataset, I’d spend time writing the same preprocessing logic just to get the data into a usable state.&lt;/p&gt;

&lt;p&gt;It wasn’t the hardest part of the job—but it was always there.&lt;/p&gt;

&lt;p&gt;And it always slowed things down.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Noticed
&lt;/h2&gt;

&lt;p&gt;The issue isn’t complexity.&lt;/p&gt;

&lt;p&gt;It’s repetition.&lt;/p&gt;

&lt;p&gt;You already know what needs to be done:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;clean missing values&lt;/li&gt;
&lt;li&gt;remove duplicates&lt;/li&gt;
&lt;li&gt;normalize data&lt;/li&gt;
&lt;li&gt;filter outliers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But you still have to write it. Every time.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Usual Workflow
&lt;/h2&gt;

&lt;p&gt;Most of the time, I’d:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;copy snippets from previous projects
&lt;/li&gt;
&lt;li&gt;reuse old notebooks
&lt;/li&gt;
&lt;li&gt;write quick pandas scripts
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It works—but it’s not efficient.&lt;/p&gt;

&lt;p&gt;Especially when you just want to:&lt;br&gt;
👉 quickly inspect a dataset&lt;br&gt;&lt;br&gt;
👉 apply basic transformations&lt;br&gt;&lt;br&gt;
👉 move on to actual analysis or pipeline logic  &lt;/p&gt;




&lt;h2&gt;
  
  
  So I Tried Something Different
&lt;/h2&gt;

&lt;p&gt;Instead of writing the same code over and over, I started experimenting with doing preprocessing directly inside the IDE.&lt;/p&gt;

&lt;p&gt;That led me to build a small JetBrains plugin.&lt;/p&gt;

&lt;p&gt;The idea is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Load a CSV file inside the IDE
&lt;/li&gt;
&lt;li&gt;Apply common preprocessing steps visually
&lt;/li&gt;
&lt;li&gt;Generate ready-to-run pandas code from those actions
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What It Looks Like
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3tuvjoc2orhzfr4lv0x.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3tuvjoc2orhzfr4lv0x.gif" alt="Demo gif" width="720" height="448"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What It Handles
&lt;/h2&gt;

&lt;p&gt;Right now, it supports things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Column profiling (types, null counts, stats)
&lt;/li&gt;
&lt;li&gt;Handling missing values (drop, fill with mean/median/mode/custom)
&lt;/li&gt;
&lt;li&gt;Removing duplicates
&lt;/li&gt;
&lt;li&gt;Outlier detection (IQR-based)
&lt;/li&gt;
&lt;li&gt;Normalization (Min-Max, Z-score)
&lt;/li&gt;
&lt;li&gt;Type casting
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the part I find most useful:&lt;/p&gt;

&lt;p&gt;👉 it generates clean pandas code based on what you do  &lt;/p&gt;

&lt;p&gt;So you still end up with code you can use in scripts, pipelines, or notebooks.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Helped Me
&lt;/h2&gt;

&lt;p&gt;For me, this made it much faster to go from:&lt;/p&gt;

&lt;p&gt;raw data → cleaned dataset → usable code&lt;/p&gt;

&lt;p&gt;without constantly switching context or rewriting boilerplate.&lt;/p&gt;




&lt;h2&gt;
  
  
  Still Early
&lt;/h2&gt;

&lt;p&gt;This is still an early version, and I’m actively improving it based on feedback.&lt;/p&gt;

&lt;p&gt;If you work with data preprocessing, ETL pipelines, or just deal with CSVs often, I’d really appreciate your thoughts.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://plugins.jetbrains.com/plugin/31226-data-preprocessor/" rel="noopener noreferrer"&gt;https://plugins.jetbrains.com/plugin/31226-data-preprocessor/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even small feedback like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what feels clunky
&lt;/li&gt;
&lt;li&gt;what’s missing
&lt;/li&gt;
&lt;li&gt;what you’d expect
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;would be really helpful.&lt;/p&gt;




&lt;h2&gt;
  
  
  Curious About Your Workflow
&lt;/h2&gt;

&lt;p&gt;How do you currently handle preprocessing?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do you just write pandas scripts each time?&lt;/li&gt;
&lt;li&gt;Use templates?&lt;/li&gt;
&lt;li&gt;Have your own utilities?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Would be interesting to hear how others approach this.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>python</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
