Release Date: December 2025
Author: @hejhdiss (Muhammed Shafin p)
Status: Alpha Release
Overview
GXD Compression Utility has received significant updates in version 0.0.0a2, introducing intelligent compression, enhanced metadata tracking, and better user experience. This release transforms GXD from a simple block-based compressor into a smart, adaptive archival tool.
Note on Versioning: While these are substantial feature additions, the version number remains 0.0.0a2 (same as the earlier version) because this is still an alpha release. Version numbers may not increment with each update during the alpha phase. The "alpha" designation indicates the software is still in early development and APIs/formats may change.
Major New Features
1. Intelligent Auto Algorithm Selection
The most significant addition to GXD is the auto mode (--algo auto), which brings intelligence to compression by automatically selecting the best algorithm for each individual block.
How It Works:
- Analyzes each block's Shannon Entropy (0.0 to 8.0 scale)
- Calculates zero-byte density and unique byte ratios
- Applies decision logic to choose optimal algorithm per block
- Stores per-block algorithm choice in metadata
Decision Logic:
If Entropy > 7.9 → Use 'none' (already compressed/encrypted)
If Zeros > 40% OR
Entropy < 3.0 → Use 'lz4' (sparse data, maximize speed)
If Entropy < 6.8 → Use 'zstd' (compressible, balanced)
Otherwise → Use 'brotli' (high redundancy, max compression)
Why This Matters:
Mixed-content files (documents with embedded images, databases with varied data types) can now benefit from optimal compression on each section. Previously, you had to choose one algorithm for the entire file, potentially expanding already-compressed sections.
Usage:
python gxd.py compress mixed_data.bin output.gxd --algo auto
2. Enhanced Block Metadata Tracking
Every compressed block now includes rich metadata for analysis and debugging:
New Metadata Fields:
-
entropy: Shannon entropy value (0.0-8.0) indicating data randomness -
time: Compression duration in seconds for performance analysis -
timestamp: Unix timestamp when the block was compressed -
algo: Actual algorithm used (critical for auto mode)
Benefits:
- Understand why auto mode chose specific algorithms
- Identify compression bottlenecks
- Analyze data characteristics across your files
- Debug compression issues with detailed per-block information
Example Metadata:
{
"id": 0,
"start": 6,
"size": 12345,
"orig_size": 1048576,
"hash": "abc123...",
"algo": "zstd",
"entropy": 5.8234,
"time": 0.023456,
"timestamp": 1703347200.123
}
3. File Attribute Preservation
GXD now preserves and restores original file system attributes, maintaining file integrity beyond just content.
Preserved Attributes:
- Permissions (mode): Read/write/execute permissions
- Modification time (mtime): When file was last modified
- Access time (atime): When file was last accessed
- User ID (uid): Owner user ID (Unix/Linux)
- Group ID (gid): Owner group ID (Unix/Linux)
How It Works:
- Attributes captured during compression
- Stored in archive metadata under
file_attr - Automatically restored during decompression
- Falls back gracefully if restoration fails (with warning)
Usage:
# Compress (attributes automatically captured)
python gxd.py compress important.bin archive.gxd
# Decompress (attributes automatically restored)
python gxd.py decompress archive.gxd -o important.bin
4. Archive Information Command
New info command provides comprehensive archive inspection without decompression.
Features:
- View global archive metadata (version, algorithm, block count)
- Display preserved file attributes with human-readable timestamps
- List block overview showing algorithm choices and sizes
- Inspect detailed metadata for specific blocks
Usage:
# View general archive information
python gxd.py info data.gxd
# Inspect specific block (1-based index)
python gxd.py info data.gxd --block 5
Sample Output:
==================================================
GXD ARCHIVE INFORMATION
==================================================
File Name : data.gxd
GXD Version : 0.0.0a2
Global Algo : auto
Total Blocks : 128
--- Preserved File Attributes ---
Original Mode : 0o644
Modify Time : Sat Dec 23 10:30:00 2024
Access Time : Sat Dec 23 10:30:00 2024
--- Block Overview (First 5) ---
ID | Algo | Size | Orig Size
---------------------------------------------
1 | zstd | 524288 | 1048576
2 | lz4 | 98304 | 1048576
3 | brotli | 712345 | 1048576
4 | none | 1048576 | 1048576
5 | zstd | 456789 | 1048576
... and 123 more blocks.
==================================================
Technical Improvements
Better Error Handling
Enhanced error messages and validation:
- Clear warnings when algorithm-specific parameters are ignored
- Improved size parsing with better error messages
- Graceful fallbacks for attribute restoration failures
- Detailed decompression error reporting with block IDs
Entropy Calculation
Implemented Shannon Entropy calculation for data analysis:
def calculate_entropy(data: bytes) -> float:
if not data: return 0.0
counter = collections.Counter(data)
entropy = 0.0
for count in counter.values():
p_x = count / len(data)
entropy -= p_x * math.log2(p_x)
return entropy
This mathematical approach provides accurate randomness measurement, enabling intelligent compression decisions.
Smart Selector Class
New GXDSmartSelector class encapsulates algorithm prediction logic:
class GXDSmartSelector:
@staticmethod
def predict(data: bytes) -> str:
entropy = calculate_entropy(data)
metrics = calculate_metrics(data)
zeros = metrics['zero_ratio']
if entropy > 7.9: return "none"
if zeros > 0.4 or entropy < 3.0: return "lz4"
if entropy < 6.8: return "zstd"
return "brotli"
Use Cases Enabled by Updates
1. Mixed Content Archives
Scenario: Backup of project directory with source code, images, and compiled binaries.
Before: Single algorithm compresses everything, potentially expanding pre-compressed images.
Now: Auto mode uses:
-
zstdfor source code (text, compressible) -
nonefor JPEG/PNG images (already compressed) -
lz4for log files with lots of zeros (sparse data)
2. Performance Analysis
Scenario: Understanding compression bottlenecks in large datasets.
Before: No visibility into per-block performance.
Now: Use info command to see:
- Which blocks took longest to compress
- Entropy distribution across your data
- Algorithm choices and their effectiveness
3. Long-term Archival
Scenario: Preserving critical files with full metadata.
Before: Only file content preserved, attributes lost.
Now: Complete file system metadata preserved, including permissions and timestamps, ensuring authentic restoration.
Performance Considerations
Auto Mode Overhead
Minimal Impact:
- Entropy calculation: ~0.001s per MB
- Algorithm selection: negligible
- Overall overhead: <1% for most workloads
When to Use Auto:
- Mixed content files
- Unknown data types
- Maximum compression efficiency desired
- Storage space is premium
When to Use Fixed Algorithm:
- Homogeneous data (all text, all binary)
- Speed is critical over compression ratio
- You know your data characteristics
Info Command Performance
Fast and Non-Intrusive:
- Only reads metadata footer (last few KB)
- No decompression required
- Instant results even for multi-GB archives
Breaking Changes
None. This release is fully backward compatible:
- Old archives decompress correctly
- New metadata fields are additions only
- Existing scripts and workflows continue working
Migration Guide
For Existing Users
No migration needed! Your existing .gxd archives work perfectly with the new version.
To leverage new features:
- Try auto mode on new compressions:
python gxd.py compress data.bin data.gxd --algo auto
- Inspect your existing archives:
python gxd.py info old_archive.gxd
- Recompress for attribute preservation:
# Decompress old archive
python gxd.py decompress old.gxd -o data.bin
# Recompress with new version (captures attributes)
python gxd.py compress data.bin new.gxd
Known Limitations
Auto Mode Requirements
Requires all three compression libraries installed:
pip install zstandard lz4 brotli
If any library is missing, auto mode will fail with clear error message.
Attribute Restoration
Platform-Specific:
- Full support on Unix/Linux systems
- Partial support on Windows (permissions may not restore)
- Cross-platform archives may lose some attributes
Permissions Required:
- Restoring uid/gid requires appropriate system permissions
- Falls back gracefully with warning if insufficient permissions
Future Possibilities
Disclaimer: The author is not committed to regular updates. These are potential ideas that may or may not be implemented:
- Machine learning-based algorithm selection
- Compression dictionary support for better ratios
- Multi-volume archive support
- GUI interface for archive inspection
- Plugin system for custom algorithms
- Differential/incremental compression
- Block-level deduplication (detecting and eliminating duplicate blocks)
- Content-aware deduplication using hash-based chunk identification
Community contributions toward these features are welcome!
Best Practices
When to Use Auto Mode
✅ Use auto when:
- Compressing diverse file types together
- Unsure about data characteristics
- Maximum compression efficiency is goal
- Data includes pre-compressed content
❌ Skip auto when:
- All data is same type (all text logs, all images)
- Speed is absolutely critical
- You've profiled and know best algorithm
- Working with small files (<1MB)
Monitoring Compression
# Compress with auto
python gxd.py compress data.bin data.gxd --algo auto
# Immediately inspect results
python gxd.py info data.gxd
# Look for:
# - Algorithm distribution (variety indicates mixed content)
# - Entropy values (high entropy = less compressible)
# - Size ratios (blocks that expanded)
Archive Management
# Regular integrity check
python gxd.py decompress data.gxd --no-verify > /dev/null
# Quick metadata verification
python gxd.py info data.gxd
# Extract sample for testing
python gxd.py seek data.gxd --offset 0 --length 1mb --text
Getting This Release
Current Version: 0.0.0a2 (Alpha)
Installation:
# Clone repository
git clone https://github.com/hejhdiss/gxd.git
cd gxd
# Install dependencies
pip install zstandard lz4 brotli tqdm
# Verify installation
python gxd.py --version
Quick Start:
# Try auto mode
python gxd.py compress yourfile.bin output.gxd --algo auto
# Inspect results
python gxd.py info output.gxd
# Decompress
python gxd.py decompress output.gxd -o restored.bin
Feedback & Contributing
Found a bug? Open an issue on GitHub.
Have an idea? Share it in discussions.
Want to contribute? Pull requests welcome!.
Questions? Check the documentation or ask the community.
Remember: This is an alpha release maintained on a best-effort basis. Updates may or may not come regularly. The project is provided as-is, but community contributions can help drive future development.
GXD Compression Utility - Making compression smarter, one block at a time.
Top comments (0)