David Midlo

Posted on Mar 19

The Problem: PowerShell’s Hashing Illusion

#microsoftgraph #automation #programming #devops

PowerShell, for all its flexibility, has a glaring gap: there is no clear, reliable way to hash complex objects deterministically.

If you’ve ever tried to:

✅ Compare structured data

✅ Track changes over time

✅ Verify data integrity

You’ve probably hit the same wall:

Most hashing functions only work for files.
Object hashing functions rely on non-deterministic serialization.
There is no robust, structured way to hash deeply nested objects.

Even worse, many common object hashing techniques in PowerShell produce false positives and false negatives without users realizing it.

🚨 The False Positives & False Negatives We’ve Been Living With

Have you ever written something like this?

$hash1 = $object1 | ConvertTo-Json -Depth 10 | Get-FileHash -Algorithm SHA256
$hash2 = $object2 | ConvertTo-Json -Depth 10 | Get-FileHash -Algorithm SHA256

…and confidently said, “These two objects are identical!”

You’ve been playing yourself.

Why?

Because PowerShell doesn’t preserve object structure deterministically.

Property order of unordered dictionary-like collections, including PSCustomObjects can change between serializations.
Hashtables have no guaranteed ordering unless explicitly marked [Ordered].
ConvertTo-Json doesn’t respect data types perfectly (e.g., 30 and "30" can serialize the same).
Collections inheriting from IList often get coerced into simple @() arrays when serialized, often losing behavioral semantics defined by a strict type chosen by the developer.
Different PowerShell versions serialize objects differently.

This means:

False negatives: Two logically identical objects produce different hashes.
False positives: Two structurally different objects produce the same hash.

PowerShell’s JSON-based hashing is fundamentally flawed for complex objects.

The Search for a Solution: Where Is It?

When I first started looking, I assumed there’d be at least one solid, reusable object hashing function. So, I did the obvious thing:

Searched PowerShell Gallery for modules related to hashing.
Looked for Get-Hash, ConvertTo-Hash, or Compute-Hash functions.
Googled “PowerShell object hashing”, expecting an easy solution.

I found nothing useful.

It wasn’t until I started Google dorking—crafting targeted searches—that I uncovered the sad truth:

Most hashing functions only work for files.
Many claim object hashing but rely on non-deterministic serialization.
None provide a robust, structured way to hash deeply nested objects.

The Google Dorking Journey

Here’s a small taste of the search queries I ran to find any existing object hashing solutions in the ecosystem:

General hash functions

   site:powershellgallery.com "hash" "ConvertTo-SecureString" OR "Get-Hash" OR "SHA256" OR "MD5"

Searching for file-independent hashing utilities

   site:powershellgallery.com "function" "hash" "MemoryStream" OR "Get-FileHash"

Looking for anything using .NET’s ComputeHash method

   site:powershellgallery.com "New-Object" "System.Security.Cryptography" "ComputeHash"

Searching for JSON-based hashing attempts

   site:powershellgallery.com inurl:package "hash" "ConvertTo-Json" OR "Serialize"

Trying to find hashing functions using legacy .NET cryptography providers

   site:powershellgallery.com "PowerShell" "SHA1CryptoServiceProvider" OR "SHA256Managed" OR "MD5CryptoServiceProvider"

Hunting for functions that explicitly compute hashes

   site:powershellgallery.com inurl:functions "Get-Hash" OR "ConvertTo-Hash" OR "Compute-Hash"

Seeing if anyone had implemented hashing for byte arrays directly

   site:powershellgallery.com "hash" "byte[]" "ComputeHash" "MemoryStream"

Looking for mentions of hashing objects in-memory instead of serializing them

   site:powershellgallery.com "Get-Hash" "in-memory object" "SHA256" OR "MD5"

Checking if ConvertTo-SecureString had been misused for hashing

   site:powershellgallery.com "ConvertTo-SecureString" "ComputeHash" "memory"

Digging through functions that mention hashing in the context of streaming

   site:powershellgallery.com "hash function" "MemoryStream" "ConvertTo-HexString"

Why Existing “Solutions” Fail

After analyzing what was out there, the gaps became painfully obvious:

🚨 1. Hashing Is Not Deterministic

PowerShell’s ConvertTo-Json changes behavior between versions.
Unordered collections don’t serialize predictably.
Collections inheriting from IList often get coerced into simple lists, leading to structural inconsistencies.
Some implementations accidentally shuffle object properties, making the same object return different hashes.

🚨 2. No Standard Approach Exists

Some scripts use ConvertTo-Json, others use .ToString(), and a few serialize objects to XML first.
None provide a consistent, structured way to process objects.
There is no official guidance on how to hash structured data reliably.

🚨 3. Performance Issues Everywhere

Most implementations serialize the entire object into a string before hashing it.
This means hashing a large object creates massive memory overhead.
No existing function streams data efficiently into the hasher.

🚨 4. The JSON Hashing Trap

To be fair, ConvertTo-Json isn't always wrong.

For simple ordered lists and [ordered]@{} hashtables, JSON-based hashing can work:

$orderedHashTable = [ordered]@{
    Name = 'John Doe'
    Age  = 30
    City = 'New York'
}
$json = $orderedHashTable | ConvertTo-Json

However, the moment you introduce:

Regular hashtables (@{})
HashSets, Dictionaries, PSCustomObjects, or custom object types

Your hash becomes non-deterministic because:

PowerShell’s internal object retrieval doesn’t guarantee a consistent order.
The serializer makes no guarantees about property traversal order.

🚨 5. The Type Coercion Problem

Even if JSON serialization were perfectly ordered, it still fails to differentiate data types properly:

$object1 = @{ Name = "Alice"; Age = 30 }
$object2 = @{ Name = "Alice"; Age = "30" }  # Age is a string here

$hash1 = $object1 | ConvertTo-Json -Depth 10 | Get-FileHash -Algorithm SHA256
$hash2 = $object2 | ConvertTo-Json -Depth 10 | Get-FileHash -Algorithm SHA256

$hash1 -eq $hash2  # True, but these objects aren't actually the same

PowerShell implicitly converts numbers to strings in JSON, leading to false positives.

Why This Matters: The Missing PowerShell Standard

We need a real, deterministic, efficient way to hash complex objects or the results could be devastating, and likely have been so already.

A proper PowerShell object hashing function should:

✅ Preserve object structure – Ordered collections should stay ordered; unordered collections should be sorted before hashing.

✅ Avoid full serialization overhead – It should process objects incrementally instead of dumping them into a giant string first.

✅ Handle nested and recursive structures – Objects can reference other objects; we need a way to track and hash them properly.

✅ Respect complex collection types – Lists inheriting from IList shouldn’t get silently coerced into generic arrays.

✅ Provide developer control – Users should be able to exclude properties or use custom mappers.

Where We Stand at v0.9: Filling the Gap

Toward a Better PowerShell Object Hash Generator

A truly robust object hashing system for PowerShell should have the following features:

Deterministic Hashing Across Object Structures
- The hash should remain consistent across multiple invocations, ensuring that the same input object always produces the same output hash.
- Object serialization should be order-preserving for ordered collections (e.g., [ordered]@{}, List, Queue, Stack), while unordered collections (e.g., Hashtable, Dictionary) should be sorted to maintain hash stability.
Support for Complex and Nested Data Structures
- The system should correctly process deeply nested objects, including dictionaries, lists, and custom objects.
- Handling of PSCustomObject, Hashtable, [ordered]@{}, and .NET collections should be seamless.
- Circular references should be detected and safely handled to prevent infinite recursion.
Configurable Hashing Algorithms
- Users should have the ability to select from multiple hash algorithms, such as MD5, SHA1, SHA256, SHA384, and SHA512, to meet security and performance needs.
- The default should be a secure option like SHA256 for strong cryptographic integrity.
Selective Field Exclusion
- Certain object properties or dictionary keys should be excludable from the hashing process, allowing users to ignore transient or irrelevant data fields.
- A HashSet[string]-based implementation ensures efficient lookups for ignored fields.
Binary and Streaming Hash Computation
- Instead of converting objects to strings, a robust hashing system should use BSON serialization to create a stable, compact binary representation.
- Streaming hash computation should allow large objects to be processed efficiently in chunks, minimizing memory overhead.
Canonical Object Normalization
- Before hashing, objects should be normalized into a standard structure to ensure hash stability across different representations of the same logical data.
- Floating-point numbers should be normalized to avoid precision-related inconsistencies.
Equality Comparisons and Operator Overloading
- Hash comparisons should be simplified through overloaded equality operators (-eq, -ne), allowing intuitive comparisons between hashes and objects.
- The class should provide a clean ToString() method to retrieve the computed hash easily.
Error Handling and Fault Tolerance
- The hashing system should gracefully handle unexpected object types and serialization errors.
- Meaningful error messages should be provided when an unsupported object type is encountered.
Pluggable Serializers for Custom Types
- The hashing system should support custom serialization strategies for user-defined objects and complex .NET types.
- A modular design should allow developers to extend or override serialization behavior using pluggable components.
- This ensures compatibility with non-serializable objects, specialized data structures, or domain-specific representations.

Toward a Better PowerShell Object Hash Generator: Introducing the `DataHash` Class

To fix these issues, I built v0.9 of DataHash, a structured and deterministic object hashing solution that actually works.

🔹 Ensures object structure is preserved

🔹 Sorts unordered collections before hashing

🔹 Provides fine-tuned control over ephemeral data that would pollute the object's identity

🔹 Uses BSON serialization instead of unreliable JSON

🔹 Maintains the identity of self-referencing objects without causing infinite loops

By integrating these features, the DataHash class provides a powerful, flexible, and reliable solution for object hashing in PowerShell, making it ideal for scenarios such as data deduplication, integrity verification, and caching mechanisms.

At this stage, v0.9 is one of the first practical PowerShell object hashing solutions that actually works the way we need it to.

**v1, we're not there yet?

No.

With v1.0 nearly complete, Get-DataHash now provides a fluent and reliable class that meets the exact demands of the project it was built for and gives greater confidence over any existing solution.

The next is step—finalizing v1 with a cmdlet wrapper—is largely an aesthetic one, making the tool more idiomatic to PowerShell and easier to use in scripts.

However, functionally, the class itself is complete save exposing the custom serializer API, which will no doubt be made available within the the scope the tasks at hand, and with that, my focus will shift back to my current project work.

Looking Ahead to v2: A C# Rewrite with Expanded Capabilities

I’m writing this down now because I actually have to use this class, and I don’t want to forget the roadmap as I’ve conceived it.

With the broader scope of paradigms available in C#, and after thinking through how to achieve parallelism while maintaining hash stability, v2 will eliminate the LiteDB dependency entirely. The Custom Type serialization features will still be supported, but through alternative mappers, rather than requiring LiteDB.dll as part of the project.

Hybrid BFS/DFS Parallelization: Branch-Slicing for Deterministic Hashing

Along with the base performance increase granted by C#, The v2 implementation will introduce a straight-forward batch processing API that will return an ordered collection of hashes as well as structured, hybrid parallelization model:

Use BFS to slice complex objects into major branches that can be hashed in parallel.
Within each branch, apply Postorder DFS streaming and stringifying each element adding it to the stream to ensure strict value-based determinism.
Merkle-tree the branch hashes as a final aggregation step to maintain structural integrity.
Use structured bookkeeping (branchId, hash) to maintain execution order stability.

This approach ensures:

Branches can be processed in parallel, reducing bottlenecks.
Each branch maintains deterministic order using DFS.
The final hash respects both structure and execution order, guaranteeing consistency across runs.

Parallelization Without Breaking Determinism

PowerShell’s threading model makes true parallelization complex, so v1 remains single-threaded for strict control. The v2 C# rewrite, however, will introduce:

📌 Thread-safe object hashing → Parallel execution without data races.

📌 Efficient memory management → Stream large objects without full serialization.

📌 Optimized Merkle aggregation → Ensuring final hashes remain structurally stable.

By leveraging branch-based parallelization, we can make hashing faster without breaking deterministic guarantees.

Open Questions for v2

🔹 Define branch slicing heuristics – When to BFS slice vs. process serially. Right now, I haven't decided how to handle the the cascade whereby while walking a branch that have sub-brances. I can either choose to only allow first order parallelization on the tree, but, if it is a larger tree, efficiency gains might be made by spinning up a new context... that would also move our markle down into the tree some n levels requiring more bookkeeping to assemble a final stable hash. And Similarly;
🔹 Determine if intermediate DFS results can be pre-hashed for better memory efficiency for simple types like scalars, etc. again more bookkeeping to ensure hash stability.
🔹 Establish a simple parallel execution model – Thread pool, task queue, async processing. (to bookkeep to make it explicit and easy for contributors, or use structured wizardry.)

I’ll accept pull requests with tests if anyone wants to contribute, but major development on v2 will be driven by real project needs.

📌 Repo: GitHub – dmidlo/Get-DataHash

📌 Package: PowerShell Gallery – Get-DataHash v0.9.2

🚀 v0.9 is in use, v1... I chose to write this instead, coming down the pike within the months, v2 is mapped out, and this is the reference for when it’s time to build. 🔥

DEV Community

The Problem: PowerShell’s Hashing Illusion

🚨 The False Positives & False Negatives We’ve Been Living With

The Search for a Solution: Where Is It?

The Google Dorking Journey

Why Existing “Solutions” Fail

🚨 1. Hashing Is Not Deterministic

🚨 2. No Standard Approach Exists

🚨 3. Performance Issues Everywhere

🚨 4. The JSON Hashing Trap

🚨 5. The Type Coercion Problem

Why This Matters: The Missing PowerShell Standard

Where We Stand at v0.9: Filling the Gap

Toward a Better PowerShell Object Hash Generator

Toward a Better PowerShell Object Hash Generator: Introducing the `DataHash` Class

**v1, we're not there yet?

Looking Ahead to v2: A C# Rewrite with Expanded Capabilities

Hybrid BFS/DFS Parallelization: Branch-Slicing for Deterministic Hashing

Parallelization Without Breaking Determinism

Open Questions for v2

Top comments (0)

🚨 The False Positives & False Negatives We’ve Been Living With

The Search for a Solution: Where Is It?

The Google Dorking Journey

Why Existing “Solutions” Fail

🚨 1. Hashing Is Not Deterministic

🚨 2. No Standard Approach Exists

🚨 3. Performance Issues Everywhere

🚨 4. The JSON Hashing Trap

🚨 5. The Type Coercion Problem

Why This Matters: The Missing PowerShell Standard

Where We Stand at v0.9: Filling the Gap

Toward a Better PowerShell Object Hash Generator

Toward a Better PowerShell Object Hash Generator: Introducing the DataHash Class

**v1, we're not there yet?

Looking Ahead to v2: A C# Rewrite with Expanded Capabilities

Hybrid BFS/DFS Parallelization: Branch-Slicing for Deterministic Hashing

Parallelization Without Breaking Determinism

Open Questions for v2

Toward a Better PowerShell Object Hash Generator: Introducing the `DataHash` Class