PowerShell, for all its flexibility, has a glaring gap: there is no clear, reliable way to hash complex objects deterministically.
If you’ve ever tried to:
✅ Compare structured data
✅ Track changes over time
✅ Verify data integrity
You’ve probably hit the same wall:
- Most hashing functions only work for files.
- Object hashing functions rely on non-deterministic serialization.
- There is no robust, structured way to hash deeply nested objects.
Even worse, many common object hashing techniques in PowerShell produce false positives and false negatives without users realizing it.
🚨 The False Positives & False Negatives We’ve Been Living With
Have you ever written something like this?
$hash1 = $object1 | ConvertTo-Json -Depth 10 | Get-FileHash -Algorithm SHA256
$hash2 = $object2 | ConvertTo-Json -Depth 10 | Get-FileHash -Algorithm SHA256
…and confidently said, “These two objects are identical!”
You’ve been playing yourself.
Why?
Because PowerShell doesn’t preserve object structure deterministically.
-
Property order of unordered dictionary-like collections, including
PSCustomObjects
can change between serializations. -
Hashtables have no guaranteed ordering unless explicitly marked
[Ordered]
. -
ConvertTo-Json
doesn’t respect data types perfectly (e.g.,30
and"30"
can serialize the same). -
Collections inheriting from
IList
often get coerced into simple@()
arrays when serialized, often losing behavioral semantics defined by a strict type chosen by the developer. - Different PowerShell versions serialize objects differently.
This means:
- False negatives: Two logically identical objects produce different hashes.
- False positives: Two structurally different objects produce the same hash.
PowerShell’s JSON-based hashing is fundamentally flawed for complex objects.
The Search for a Solution: Where Is It?
When I first started looking, I assumed there’d be at least one solid, reusable object hashing function. So, I did the obvious thing:
- Searched PowerShell Gallery for modules related to hashing.
- Looked for Get-Hash, ConvertTo-Hash, or Compute-Hash functions.
- Googled “PowerShell object hashing”, expecting an easy solution.
I found nothing useful.
It wasn’t until I started Google dorking—crafting targeted searches—that I uncovered the sad truth:
- Most hashing functions only work for files.
- Many claim object hashing but rely on non-deterministic serialization.
- None provide a robust, structured way to hash deeply nested objects.
The Google Dorking Journey
Here’s a small taste of the search queries I ran to find any existing object hashing solutions in the ecosystem:
- General hash functions
site:powershellgallery.com "hash" "ConvertTo-SecureString" OR "Get-Hash" OR "SHA256" OR "MD5"
- Searching for file-independent hashing utilities
site:powershellgallery.com "function" "hash" "MemoryStream" OR "Get-FileHash"
-
Looking for anything using .NET’s
ComputeHash
method
site:powershellgallery.com "New-Object" "System.Security.Cryptography" "ComputeHash"
- Searching for JSON-based hashing attempts
site:powershellgallery.com inurl:package "hash" "ConvertTo-Json" OR "Serialize"
- Trying to find hashing functions using legacy .NET cryptography providers
site:powershellgallery.com "PowerShell" "SHA1CryptoServiceProvider" OR "SHA256Managed" OR "MD5CryptoServiceProvider"
- Hunting for functions that explicitly compute hashes
site:powershellgallery.com inurl:functions "Get-Hash" OR "ConvertTo-Hash" OR "Compute-Hash"
- Seeing if anyone had implemented hashing for byte arrays directly
site:powershellgallery.com "hash" "byte[]" "ComputeHash" "MemoryStream"
- Looking for mentions of hashing objects in-memory instead of serializing them
site:powershellgallery.com "Get-Hash" "in-memory object" "SHA256" OR "MD5"
-
Checking if
ConvertTo-SecureString
had been misused for hashing
site:powershellgallery.com "ConvertTo-SecureString" "ComputeHash" "memory"
- Digging through functions that mention hashing in the context of streaming
site:powershellgallery.com "hash function" "MemoryStream" "ConvertTo-HexString"
Why Existing “Solutions” Fail
After analyzing what was out there, the gaps became painfully obvious:
🚨 1. Hashing Is Not Deterministic
- PowerShell’s
ConvertTo-Json
changes behavior between versions. - Unordered collections don’t serialize predictably.
- Collections inheriting from
IList
often get coerced into simple lists, leading to structural inconsistencies. - Some implementations accidentally shuffle object properties, making the same object return different hashes.
🚨 2. No Standard Approach Exists
- Some scripts use
ConvertTo-Json
, others use.ToString()
, and a few serialize objects to XML first. - None provide a consistent, structured way to process objects.
- There is no official guidance on how to hash structured data reliably.
🚨 3. Performance Issues Everywhere
- Most implementations serialize the entire object into a string before hashing it.
- This means hashing a large object creates massive memory overhead.
- No existing function streams data efficiently into the hasher.
🚨 4. The JSON Hashing Trap
To be fair, ConvertTo-Json
isn't always wrong.
For simple ordered lists and [ordered]@{}
hashtables, JSON-based hashing can work:
$orderedHashTable = [ordered]@{
Name = 'John Doe'
Age = 30
City = 'New York'
}
$json = $orderedHashTable | ConvertTo-Json
However, the moment you introduce:
-
Regular hashtables (
@{}
) - HashSets, Dictionaries, PSCustomObjects, or custom object types
Your hash becomes non-deterministic because:
- PowerShell’s internal object retrieval doesn’t guarantee a consistent order.
- The serializer makes no guarantees about property traversal order.
🚨 5. The Type Coercion Problem
Even if JSON serialization were perfectly ordered, it still fails to differentiate data types properly:
$object1 = @{ Name = "Alice"; Age = 30 }
$object2 = @{ Name = "Alice"; Age = "30" } # Age is a string here
$hash1 = $object1 | ConvertTo-Json -Depth 10 | Get-FileHash -Algorithm SHA256
$hash2 = $object2 | ConvertTo-Json -Depth 10 | Get-FileHash -Algorithm SHA256
$hash1 -eq $hash2 # True, but these objects aren't actually the same
PowerShell implicitly converts numbers to strings in JSON, leading to false positives.
Why This Matters: The Missing PowerShell Standard
We need a real, deterministic, efficient way to hash complex objects or the results could be devastating, and likely have been so already.
A proper PowerShell object hashing function should:
✅ Preserve object structure – Ordered collections should stay ordered; unordered collections should be sorted before hashing.
✅ Avoid full serialization overhead – It should process objects incrementally instead of dumping them into a giant string first.
✅ Handle nested and recursive structures – Objects can reference other objects; we need a way to track and hash them properly.
✅ Respect complex collection types – Lists inheriting from IList
shouldn’t get silently coerced into generic arrays.
✅ Provide developer control – Users should be able to exclude properties or use custom mappers.
Where We Stand at v0.9: Filling the Gap
Toward a Better PowerShell Object Hash Generator
A truly robust object hashing system for PowerShell should have the following features:
-
Deterministic Hashing Across Object Structures
- The hash should remain consistent across multiple invocations, ensuring that the same input object always produces the same output hash.
- Object serialization should be order-preserving for ordered collections (e.g.,
[ordered]@{}
,List
,Queue
,Stack
), while unordered collections (e.g.,Hashtable
,Dictionary
) should be sorted to maintain hash stability.
-
Support for Complex and Nested Data Structures
- The system should correctly process deeply nested objects, including dictionaries, lists, and custom objects.
- Handling of PSCustomObject,
Hashtable
,[ordered]@{}
, and .NET collections should be seamless. - Circular references should be detected and safely handled to prevent infinite recursion.
-
Configurable Hashing Algorithms
- Users should have the ability to select from multiple hash algorithms, such as MD5, SHA1, SHA256, SHA384, and SHA512, to meet security and performance needs.
- The default should be a secure option like SHA256 for strong cryptographic integrity.
-
Selective Field Exclusion
- Certain object properties or dictionary keys should be excludable from the hashing process, allowing users to ignore transient or irrelevant data fields.
- A
HashSet[string]
-based implementation ensures efficient lookups for ignored fields.
-
Binary and Streaming Hash Computation
- Instead of converting objects to strings, a robust hashing system should use BSON serialization to create a stable, compact binary representation.
- Streaming hash computation should allow large objects to be processed efficiently in chunks, minimizing memory overhead.
-
Canonical Object Normalization
- Before hashing, objects should be normalized into a standard structure to ensure hash stability across different representations of the same logical data.
- Floating-point numbers should be normalized to avoid precision-related inconsistencies.
-
Equality Comparisons and Operator Overloading
- Hash comparisons should be simplified through overloaded equality operators (
-eq
,-ne
), allowing intuitive comparisons between hashes and objects. - The class should provide a clean
ToString()
method to retrieve the computed hash easily.
- Hash comparisons should be simplified through overloaded equality operators (
-
Error Handling and Fault Tolerance
- The hashing system should gracefully handle unexpected object types and serialization errors.
- Meaningful error messages should be provided when an unsupported object type is encountered.
-
Pluggable Serializers for Custom Types
- The hashing system should support custom serialization strategies for user-defined objects and complex .NET types.
- A modular design should allow developers to extend or override serialization behavior using pluggable components.
- This ensures compatibility with non-serializable objects, specialized data structures, or domain-specific representations.
Toward a Better PowerShell Object Hash Generator: Introducing the DataHash
Class
To fix these issues, I built v0.9 of DataHash
, a structured and deterministic object hashing solution that actually works.
🔹 Ensures object structure is preserved
🔹 Sorts unordered collections before hashing
🔹 Provides fine-tuned control over ephemeral data that would pollute the object's identity
🔹 Uses BSON serialization instead of unreliable JSON
🔹 Maintains the identity of self-referencing objects without causing infinite loops
By integrating these features, the DataHash
class provides a powerful, flexible, and reliable solution for object hashing in PowerShell, making it ideal for scenarios such as data deduplication, integrity verification, and caching mechanisms.
At this stage, v0.9 is one of the first practical PowerShell object hashing solutions that actually works the way we need it to.
**v1, we're not there yet?
No.
With v1.0 nearly complete, Get-DataHash now provides a fluent and reliable class that meets the exact demands of the project it was built for and gives greater confidence over any existing solution.
The next is step—finalizing v1 with a cmdlet wrapper—is largely an aesthetic one, making the tool more idiomatic to PowerShell and easier to use in scripts.
However, functionally, the class itself is complete save exposing the custom serializer API, which will no doubt be made available within the the scope the tasks at hand, and with that, my focus will shift back to my current project work.
Looking Ahead to v2: A C# Rewrite with Expanded Capabilities
I’m writing this down now because I actually have to use this class, and I don’t want to forget the roadmap as I’ve conceived it.
With the broader scope of paradigms available in C#, and after thinking through how to achieve parallelism while maintaining hash stability, v2 will eliminate the LiteDB dependency entirely. The Custom Type serialization features will still be supported, but through alternative mappers, rather than requiring LiteDB.dll
as part of the project.
Hybrid BFS/DFS Parallelization: Branch-Slicing for Deterministic Hashing
Along with the base performance increase granted by C#, The v2 implementation will introduce a straight-forward batch processing API that will return an ordered collection of hashes as well as structured, hybrid parallelization model:
- Use BFS to slice complex objects into major branches that can be hashed in parallel.
- Within each branch, apply Postorder DFS streaming and stringifying each element adding it to the stream to ensure strict value-based determinism.
- Merkle-tree the branch hashes as a final aggregation step to maintain structural integrity.
-
Use structured bookkeeping (
branchId, hash
) to maintain execution order stability.
This approach ensures:
- Branches can be processed in parallel, reducing bottlenecks.
- Each branch maintains deterministic order using DFS.
- The final hash respects both structure and execution order, guaranteeing consistency across runs.
Parallelization Without Breaking Determinism
PowerShell’s threading model makes true parallelization complex, so v1 remains single-threaded for strict control. The v2 C# rewrite, however, will introduce:
📌 Thread-safe object hashing → Parallel execution without data races.
📌 Efficient memory management → Stream large objects without full serialization.
📌 Optimized Merkle aggregation → Ensuring final hashes remain structurally stable.
By leveraging branch-based parallelization, we can make hashing faster without breaking deterministic guarantees.
Open Questions for v2
🔹 Define branch slicing heuristics – When to BFS slice vs. process serially. Right now, I haven't decided how to handle the the cascade whereby while walking a branch that have sub-brances. I can either choose to only allow first order parallelization on the tree, but, if it is a larger tree, efficiency gains might be made by spinning up a new context... that would also move our markle down into the tree some n
levels requiring more bookkeeping to assemble a final stable hash. And Similarly;
🔹 Determine if intermediate DFS results can be pre-hashed for better memory efficiency for simple types like scalars, etc. again more bookkeeping to ensure hash stability.
🔹 Establish a simple parallel execution model – Thread pool, task queue, async processing. (to bookkeep to make it explicit and easy for contributors, or use structured wizardry.)
I’ll accept pull requests with tests if anyone wants to contribute, but major development on v2 will be driven by real project needs.
📌 Repo: GitHub – dmidlo/Get-DataHash
📌 Package: PowerShell Gallery – Get-DataHash v0.9.2
🚀 v0.9 is in use, v1... I chose to write this instead, coming down the pike within the months, v2 is mapped out, and this is the reference for when it’s time to build. 🔥
Top comments (0)