Building an AI-Powered Content Scanner for Windows: Performance, Multithreading and GPU Acceleration in .NET
Building software always looks straightforward from the outside.
You load a machine learning model, point it at some images, and display the results.
At least that's what I thought when I started building DetectNix Vision, a Windows desktop application that performs local AI-powered image analysis without uploading user data to the cloud.
In reality, the project became a deep dive into performance optimization, memory management, multithreading, GPU acceleration, and user experience.
This article covers the engineering challenges I encountered and the architectural decisions I made while building the software from the perspective of a senior developer.
The Original Goal
The initial goal was simple:
- Scan images stored on a Windows PC
- Detect potentially explicit or sensitive content
- Keep all processing local
- Support both CPU and GPU execution
- Process large image collections efficiently
- Remain responsive while scanning
Privacy was a major requirement.
I didn't want users uploading personal files to third-party services. Everything needed to run locally on the user's machine.
That decision immediately influenced every technical choice that followed.
Challenge #1: Model Loading Performance
One of the first mistakes I made was loading the AI model too frequently.
A modern computer vision model can be hundreds of megabytes in size. Loading it repeatedly creates significant startup overhead and quickly destroys performance.
My initial implementation worked perfectly during testing because I was only processing a handful of images.
Once I started testing larger image collections, the bottleneck became obvious.
The Solution
I moved to a singleton-style architecture where the model is loaded once during application startup and remains resident in memory.
private readonly InferenceSession _session;
public VisionEngine()
{
_session = CreateSession();
}
This reduced initialization costs dramatically and ensured every image could reuse the same loaded model.
The lesson here is simple:
AI models should usually be treated like databases or connection pools, not disposable objects.
Load them once. Reuse them often.
Challenge #2: CPU Usage Was Out of Control
The next issue appeared when processing thousands of images.
The obvious approach is:
foreach(var image in images)
{
Analyze(image);
}
Unfortunately, this wastes modern hardware.
A single image analysis might only use a fraction of the available CPU resources.
My first instinct was to parallelize everything.
Parallel.ForEach(images, image =>
{
Analyze(image);
});
This certainly increased throughput.
It also created new problems.
Challenge #3: Too Much Parallelism
Many developers assume that more threads automatically equals more performance.
With machine learning workloads, that's often not true.
I discovered that excessive parallelism caused:
- Increased memory consumption
- Context switching overhead
- Reduced GPU efficiency
- System responsiveness issues
The application became faster in benchmarks but slower in real-world usage.
The operating system was spending too much time managing threads instead of performing useful work.
The Solution
I implemented a controlled worker model.
Instead of allowing unlimited concurrency, I created a configurable processing pool.
var maxSessions = Math.Min(Environment.ProcessorCount, 4);
This allowed me to tune throughput while keeping resource usage predictable.
In practice, a carefully controlled number of workers consistently outperformed unrestricted parallel execution.
This was one of the most valuable lessons from the project:
The fastest architecture is rarely the one with the most threads.
Challenge #4: GPU Acceleration Isn't Automatic
Many users assume that installing a graphics card means software automatically becomes faster.
Unfortunately, that's not how machine learning inference works.
Supporting GPU acceleration introduced several challenges:
- Detecting available hardware
- Selecting execution providers
- Handling driver differences
- Supporting systems without compatible GPUs
- Providing reliable CPU fallback
A failed GPU initialization could not be allowed to crash the application.
The Solution
The startup sequence attempts GPU initialization first.
If that fails, the application transparently falls back to CPU execution.
try
{
EnableGpuProvider();
}
catch
{
EnableCpuProvider();
}
This approach ensured the software would run on virtually any Windows machine.
Performance varies significantly between systems, but functionality remains consistent.
Challenge #5: Memory Pressure During Large Scans
Scanning a directory containing 50 images is easy.
Scanning a directory containing 100,000 images is a different problem entirely.
Early versions accumulated too much data in memory.
This resulted in:
- Increased garbage collection activity
- Higher memory usage
- Reduced throughput
- Longer scan times
The Solution
I switched to a streaming pipeline.
Instead of loading large batches of files, images are processed incrementally.
foreach(var file in Directory.EnumerateFiles(path))
{
Process(file);
}
This dramatically reduced memory consumption and allowed scans of extremely large collections without exhausting system resources.
Sometimes the simplest optimization is simply processing less data at once.
Challenge #6: Keeping the UI Responsive
Desktop users have very little tolerance for frozen applications.
A scan that takes several minutes is acceptable.
An application that stops responding for several minutes is not.
Initially, image analysis was competing with the user interface thread.
The result was predictable.
Windows marked the application as "Not Responding."
The Solution
I completely separated the scanning pipeline from the UI layer.
The scanner runs on background workers while the UI receives progress updates.
await Task.Run(() =>
{
StartScan();
});
This allowed users to:
- Browse results
- Pause scans
- View progress
- Continue interacting with the application
Even during intensive processing.
The difference in perceived quality was enormous.
Challenge #7: Handling Real-World Image Collections
Developers often test with ideal data.
Users never provide ideal data.
Real-world collections contain:
- Corrupted files
- Unsupported formats
- Zero-byte images
- Huge images
- Tiny images
- Invalid metadata
The software needed to continue scanning even when individual files failed.
The Solution
Every image is treated as potentially invalid.
Failures are isolated and logged.
try
{
Analyze(image);
}
catch(Exception ex)
{
LogError(ex);
}
A single bad file should never stop an entire scan.
This significantly improved reliability.
Challenge #8: Finding the Right Balance Between Accuracy and Speed
One of the biggest engineering trade-offs involved balancing:
- Scan speed
- Detection accuracy
- Hardware requirements
- User expectations
Larger models generally improve accuracy.
They also increase:
- Memory usage
- Startup times
- Processing times
Smaller models improve responsiveness but may sacrifice precision.
There is no universally correct answer.
The optimal balance depends on the target audience and intended use case.
For DetectNix Vision, I prioritized a solution that delivered strong accuracy while remaining practical on average consumer hardware.
Additional Engineering Decisions Worth Mentioning
Why I Chose ONNX Runtime
After evaluating several options, I standardized on ONNX Runtime.
Benefits included:
- Excellent .NET support
- GPU acceleration support
- Cross-hardware compatibility
- Consistent inference performance
- Easy deployment with Windows applications
Most importantly, it allowed me to focus on building the product instead of maintaining machine learning infrastructure.
Why I Built a Prediction Engine Pool
Creating inference sessions can be expensive.
Rather than constantly creating and disposing resources, I implemented a reusable engine pool.
Benefits included:
- Reduced allocations
- Lower startup overhead
- Better throughput
- More predictable memory usage
This became particularly important when users scanned tens of thousands of files.
Why Privacy Became a Competitive Advantage
Initially, local processing was simply a technical requirement.
Over time it became one of the product's strongest differentiators.
Many competing solutions upload content to cloud services for analysis.
DetectNix Vision performs all scanning locally.
Benefits include:
- No image uploads
- Faster processing
- Improved privacy
- No internet dependency
- Better suitability for business environments
Sometimes architectural decisions become product features.
This was one of those cases.
What I'd Do Differently
Looking back, there are several things I would prototype earlier:
- GPU support
- Threading architecture
- Memory profiling
- Large-scale stress testing
Many performance issues don't appear until the software processes thousands of files under real-world conditions.
Building those tests earlier would have saved significant development time.
Key Takeaways
Building AI-powered desktop software taught me that machine learning is only a small part of the problem.
The real challenges often involve traditional software engineering:
- Resource management
- Concurrency
- Reliability
- User experience
- Performance optimization
The AI model itself might take months to train.
But creating a fast, stable, user-friendly application around that model can easily take longer.
For me, the most important lesson was this:
Success comes from engineering the entire system, not just the AI.
Users don't care how sophisticated your model is if the application is slow, unstable, or difficult to use.
They care that it works.
And that's where software engineering still matters most.
Top comments (0)