DEV Community

panchenko-dm-aspose
panchenko-dm-aspose

Posted on

ZIP files in C# using Aspose.ZIP parallel compression

Prerequisite – C# ZIP Library

Aspose.ZIP for .NET is a powerful and easy-to-use API for zipping or unzipping files and folders within .NET applications. It also provides AES encryption techniques to encrypt the files in ZIP archives. You can install the API from NuGet or download its binaries from the Downloads section.

Classic way of creating ZIP archive in Aspose.ZIP for .NET

The classic way means that we compress files within a data folder one by one and do not take any advantage of having a multicore CPU.

using (FileStream zipFile = File.Open("archive.zip", FileMode.Create))
{
      using (Archive archive = new Archive())
      {
          archive.CreateEntry("first.bin", File.OpenRead("data1.bin"));
          ...
          archive.CreateEntry("last.bin", File.OpenRead("dataN.bin"));
          archive.Save(zipFile);
       }
}
Enter fullscreen mode Exit fullscreen mode

Parallel compression with Aspose.ZIP for .NET

Aspose.ZIP allows you to utilize all of your CPU cores using ParallelOptions. Let's look at the code first:

using (FileStream zipFile = File.Open("archive.zip", FileMode.Create))
{
    using (Archive archive = new Archive())
    {
        archive.CreateEntry("first.bin", File.OpenRead("data1.bin"));
        ...
        archive.CreateEntry("last.bin", File.OpenRead("dataN.bin"));
        archive.Save(zipFile, new ArchiveSaveOptions()
           {
             ParallelOptions = new ParallelOptions() 
             { ParallelCompressInMemory = ParallelCompressionMode.Always }
           });
    }
}
Enter fullscreen mode Exit fullscreen mode

Setting ParallelCompressInMemory indicates the strategy we choose to multitask.
Here are three options:

  • ParallelCompressionMode.Never: compression of all entries is sequential. Only one CPU core works on compression and flushes compressed data as it comes.
  • ParallelCompressionMode.Always: It forces compression of entries in different threads regardless of entry size, available memory, and other factors. Each CPU core simultaneously compresses a file keeping its compressed data in RAM. Upon the entry is compressed it flushes to the result stream. If your RAM amount is small and the total size of some N entries (where N is the number of CPU cores) is huge it may happen that all RAM available for CLR will exhaust and OutOfMemoryExcepton arises.
  • ParallelCompressionMode.Auto: It estimates CPU cores, sizes of entries, available memory and chooses whether to compress entries in parallel or sequentially. In this mode some smaller entries to be compressed in parallel while others sequentially. LZMA entries are not compressed in parallel because of high memory consumption. Generally, it is safe to go with this option, Aspose.ZIP is wary with estimations and switches to sequential compression as a fallback. There is one more property of ParallelOptions for this mode - AvailableMemorySize. It is pointless for any other mode. Roughly speaking, it is the high limit of allocated memory while compressing entries with all CPU cores, in megabytes. Aspose.ZIP uses that number to estimate the biggest size of entry which is safe to be compressed in parallel. Entries above the threshold to be compressed sequentially. AvailableMemorySize is a double-edged sword: being set too high with huge entries, it can produce RAM exhaustion, intense swap, and even might be out of memory exception. Being set too low, most of the entries will be compressed in a sequential way without much speed-up. So, sophisticated users can assign it considering trade-off.

We encourage you to play with different modes of parallel compression on your typical data to determine what is the best settings in your case.

Learn more about Aspose.ZIP for .NET

Explore more about our C# ZIP API using the following resources:

Check what Aspose.ZIP is capable of with our free web-apps that are completely based on Aspose.ZIP for .NET:

Top comments (0)