DEV Community

Muhammed Shafin P
Muhammed Shafin P

Posted on

GXD v0.0.0a2: Major Updates and New Features

Release Date: December 2025

Author: @hejhdiss (Muhammed Shafin p)

Status: Alpha Release


Overview

GXD Compression Utility has received significant updates in version 0.0.0a2, introducing intelligent compression, enhanced metadata tracking, and better user experience. This release transforms GXD from a simple block-based compressor into a smart, adaptive archival tool.

Note on Versioning: While these are substantial feature additions, the version number remains 0.0.0a2 (same as the earlier version) because this is still an alpha release. Version numbers may not increment with each update during the alpha phase. The "alpha" designation indicates the software is still in early development and APIs/formats may change.


Major New Features

1. Intelligent Auto Algorithm Selection

The most significant addition to GXD is the auto mode (--algo auto), which brings intelligence to compression by automatically selecting the best algorithm for each individual block.

How It Works:

  • Analyzes each block's Shannon Entropy (0.0 to 8.0 scale)
  • Calculates zero-byte density and unique byte ratios
  • Applies decision logic to choose optimal algorithm per block
  • Stores per-block algorithm choice in metadata

Decision Logic:

If Entropy > 7.9          → Use 'none' (already compressed/encrypted)
If Zeros > 40% OR 
   Entropy < 3.0          → Use 'lz4' (sparse data, maximize speed)
If Entropy < 6.8          → Use 'zstd' (compressible, balanced)
Otherwise                 → Use 'brotli' (high redundancy, max compression)
Enter fullscreen mode Exit fullscreen mode

Why This Matters:
Mixed-content files (documents with embedded images, databases with varied data types) can now benefit from optimal compression on each section. Previously, you had to choose one algorithm for the entire file, potentially expanding already-compressed sections.

Usage:

python gxd.py compress mixed_data.bin output.gxd --algo auto
Enter fullscreen mode Exit fullscreen mode

2. Enhanced Block Metadata Tracking

Every compressed block now includes rich metadata for analysis and debugging:

New Metadata Fields:

  • entropy: Shannon entropy value (0.0-8.0) indicating data randomness
  • time: Compression duration in seconds for performance analysis
  • timestamp: Unix timestamp when the block was compressed
  • algo: Actual algorithm used (critical for auto mode)

Benefits:

  • Understand why auto mode chose specific algorithms
  • Identify compression bottlenecks
  • Analyze data characteristics across your files
  • Debug compression issues with detailed per-block information

Example Metadata:

{
  "id": 0,
  "start": 6,
  "size": 12345,
  "orig_size": 1048576,
  "hash": "abc123...",
  "algo": "zstd",
  "entropy": 5.8234,
  "time": 0.023456,
  "timestamp": 1703347200.123
}
Enter fullscreen mode Exit fullscreen mode

3. File Attribute Preservation

GXD now preserves and restores original file system attributes, maintaining file integrity beyond just content.

Preserved Attributes:

  • Permissions (mode): Read/write/execute permissions
  • Modification time (mtime): When file was last modified
  • Access time (atime): When file was last accessed
  • User ID (uid): Owner user ID (Unix/Linux)
  • Group ID (gid): Owner group ID (Unix/Linux)

How It Works:

  • Attributes captured during compression
  • Stored in archive metadata under file_attr
  • Automatically restored during decompression
  • Falls back gracefully if restoration fails (with warning)

Usage:

# Compress (attributes automatically captured)
python gxd.py compress important.bin archive.gxd

# Decompress (attributes automatically restored)
python gxd.py decompress archive.gxd -o important.bin
Enter fullscreen mode Exit fullscreen mode

4. Archive Information Command

New info command provides comprehensive archive inspection without decompression.

Features:

  • View global archive metadata (version, algorithm, block count)
  • Display preserved file attributes with human-readable timestamps
  • List block overview showing algorithm choices and sizes
  • Inspect detailed metadata for specific blocks

Usage:

# View general archive information
python gxd.py info data.gxd

# Inspect specific block (1-based index)
python gxd.py info data.gxd --block 5
Enter fullscreen mode Exit fullscreen mode

Sample Output:

==================================================
 GXD ARCHIVE INFORMATION
==================================================
File Name      : data.gxd
GXD Version    : 0.0.0a2
Global Algo    : auto
Total Blocks   : 128

--- Preserved File Attributes ---
Original Mode  : 0o644
Modify Time    : Sat Dec 23 10:30:00 2024
Access Time    : Sat Dec 23 10:30:00 2024

--- Block Overview (First 5) ---
ID    | Algo     | Size       | Orig Size 
---------------------------------------------
1     | zstd     | 524288     | 1048576   
2     | lz4      | 98304      | 1048576   
3     | brotli   | 712345     | 1048576   
4     | none     | 1048576    | 1048576   
5     | zstd     | 456789     | 1048576   
... and 123 more blocks.
==================================================
Enter fullscreen mode Exit fullscreen mode

Technical Improvements

Better Error Handling

Enhanced error messages and validation:

  • Clear warnings when algorithm-specific parameters are ignored
  • Improved size parsing with better error messages
  • Graceful fallbacks for attribute restoration failures
  • Detailed decompression error reporting with block IDs

Entropy Calculation

Implemented Shannon Entropy calculation for data analysis:

def calculate_entropy(data: bytes) -> float:
    if not data: return 0.0
    counter = collections.Counter(data)
    entropy = 0.0
    for count in counter.values():
        p_x = count / len(data)
        entropy -= p_x * math.log2(p_x)
    return entropy
Enter fullscreen mode Exit fullscreen mode

This mathematical approach provides accurate randomness measurement, enabling intelligent compression decisions.

Smart Selector Class

New GXDSmartSelector class encapsulates algorithm prediction logic:

class GXDSmartSelector:
    @staticmethod
    def predict(data: bytes) -> str:
        entropy = calculate_entropy(data)
        metrics = calculate_metrics(data)
        zeros = metrics['zero_ratio']

        if entropy > 7.9: return "none"
        if zeros > 0.4 or entropy < 3.0: return "lz4"
        if entropy < 6.8: return "zstd"
        return "brotli"
Enter fullscreen mode Exit fullscreen mode

Use Cases Enabled by Updates

1. Mixed Content Archives

Scenario: Backup of project directory with source code, images, and compiled binaries.

Before: Single algorithm compresses everything, potentially expanding pre-compressed images.

Now: Auto mode uses:

  • zstd for source code (text, compressible)
  • none for JPEG/PNG images (already compressed)
  • lz4 for log files with lots of zeros (sparse data)

2. Performance Analysis

Scenario: Understanding compression bottlenecks in large datasets.

Before: No visibility into per-block performance.

Now: Use info command to see:

  • Which blocks took longest to compress
  • Entropy distribution across your data
  • Algorithm choices and their effectiveness

3. Long-term Archival

Scenario: Preserving critical files with full metadata.

Before: Only file content preserved, attributes lost.

Now: Complete file system metadata preserved, including permissions and timestamps, ensuring authentic restoration.


Performance Considerations

Auto Mode Overhead

Minimal Impact:

  • Entropy calculation: ~0.001s per MB
  • Algorithm selection: negligible
  • Overall overhead: <1% for most workloads

When to Use Auto:

  • Mixed content files
  • Unknown data types
  • Maximum compression efficiency desired
  • Storage space is premium

When to Use Fixed Algorithm:

  • Homogeneous data (all text, all binary)
  • Speed is critical over compression ratio
  • You know your data characteristics

Info Command Performance

Fast and Non-Intrusive:

  • Only reads metadata footer (last few KB)
  • No decompression required
  • Instant results even for multi-GB archives

Breaking Changes

None. This release is fully backward compatible:

  • Old archives decompress correctly
  • New metadata fields are additions only
  • Existing scripts and workflows continue working

Migration Guide

For Existing Users

No migration needed! Your existing .gxd archives work perfectly with the new version.

To leverage new features:

  1. Try auto mode on new compressions:
   python gxd.py compress data.bin data.gxd --algo auto
Enter fullscreen mode Exit fullscreen mode
  1. Inspect your existing archives:
   python gxd.py info old_archive.gxd
Enter fullscreen mode Exit fullscreen mode
  1. Recompress for attribute preservation:
   # Decompress old archive
   python gxd.py decompress old.gxd -o data.bin

   # Recompress with new version (captures attributes)
   python gxd.py compress data.bin new.gxd
Enter fullscreen mode Exit fullscreen mode

Known Limitations

Auto Mode Requirements

Requires all three compression libraries installed:

pip install zstandard lz4 brotli
Enter fullscreen mode Exit fullscreen mode

If any library is missing, auto mode will fail with clear error message.

Attribute Restoration

Platform-Specific:

  • Full support on Unix/Linux systems
  • Partial support on Windows (permissions may not restore)
  • Cross-platform archives may lose some attributes

Permissions Required:

  • Restoring uid/gid requires appropriate system permissions
  • Falls back gracefully with warning if insufficient permissions

Future Possibilities

Disclaimer: The author is not committed to regular updates. These are potential ideas that may or may not be implemented:

  • Machine learning-based algorithm selection
  • Compression dictionary support for better ratios
  • Multi-volume archive support
  • GUI interface for archive inspection
  • Plugin system for custom algorithms
  • Differential/incremental compression
  • Block-level deduplication (detecting and eliminating duplicate blocks)
  • Content-aware deduplication using hash-based chunk identification

Community contributions toward these features are welcome!


Best Practices

When to Use Auto Mode

Use auto when:

  • Compressing diverse file types together
  • Unsure about data characteristics
  • Maximum compression efficiency is goal
  • Data includes pre-compressed content

Skip auto when:

  • All data is same type (all text logs, all images)
  • Speed is absolutely critical
  • You've profiled and know best algorithm
  • Working with small files (<1MB)

Monitoring Compression

# Compress with auto
python gxd.py compress data.bin data.gxd --algo auto

# Immediately inspect results
python gxd.py info data.gxd

# Look for:
# - Algorithm distribution (variety indicates mixed content)
# - Entropy values (high entropy = less compressible)
# - Size ratios (blocks that expanded)
Enter fullscreen mode Exit fullscreen mode

Archive Management

# Regular integrity check
python gxd.py decompress data.gxd --no-verify > /dev/null

# Quick metadata verification
python gxd.py info data.gxd

# Extract sample for testing
python gxd.py seek data.gxd --offset 0 --length 1mb --text
Enter fullscreen mode Exit fullscreen mode

Getting This Release

Current Version: 0.0.0a2 (Alpha)

Installation:

# Clone repository
git clone https://github.com/hejhdiss/gxd.git
cd gxd

# Install dependencies
pip install zstandard lz4 brotli tqdm

# Verify installation
python gxd.py --version
Enter fullscreen mode Exit fullscreen mode

Quick Start:

# Try auto mode
python gxd.py compress yourfile.bin output.gxd --algo auto

# Inspect results
python gxd.py info output.gxd

# Decompress
python gxd.py decompress output.gxd -o restored.bin
Enter fullscreen mode Exit fullscreen mode

Feedback & Contributing

Found a bug? Open an issue on GitHub.

Have an idea? Share it in discussions.

Want to contribute? Pull requests welcome!.

Questions? Check the documentation or ask the community.


Remember: This is an alpha release maintained on a best-effort basis. Updates may or may not come regularly. The project is provided as-is, but community contributions can help drive future development.


GXD Compression Utility - Making compression smarter, one block at a time.

Top comments (0)