DEV Community

mohamed Tayel
mohamed Tayel

Posted on

1

Understanding the Need for Collections in Programming

Meta Description:Learn why collections are essential in programming through a practical sales report scenario. Understand how collections solve real-world problems, handle single-pass data sources, and enable efficient data processing with full code examples

Collections are not just convenient tools in programming; they are often essential for solving real-world problems efficiently. In this article, we’ll explore why collections are necessary using a sales report scenario. We’ll discuss how their absence can lead to errors and inefficiencies, and how using collections resolves these issues.


Scenario: Grouping and Summarizing Sales Data

Imagine you're tasked with generating a sales report. Each sale belongs to a category, and your goal is to:

  1. Group sales by category.
  2. Calculate the total sales for each category.

This seems straightforward, but if the input data comes from a source that can only be iterated once (e.g., a stream or database query), problems arise. Let’s walk through this scenario step by step.


Step 1: Initial Implementation

The task involves grouping sales by category and calculating totals. Here’s how we can approach it:

  1. Iterate through the sales data to group by category.
  2. Calculate the total sales for each group.

Code Implementation

using System;
using System.Collections.Generic;

public class Sale
{
    public string Category { get; set; }
    public decimal Amount { get; set; }

    public Sale(string category, decimal amount)
    {
        Category = category;
        Amount = amount;
    }
}

public class Program
{
    public static Dictionary<string, decimal> GroupAndSummarizeSales(IEnumerable<Sale> sales)
    {
        var categoryTotals = new Dictionary<string, decimal>();

        foreach (var sale in sales)
        {
            if (!categoryTotals.ContainsKey(sale.Category))
            {
                categoryTotals[sale.Category] = 0;
            }

            categoryTotals[sale.Category] += sale.Amount;
        }

        return categoryTotals;
    }

    public static void Main()
    {
        var sales = new List<Sale>
        {
            new Sale("Electronics", 100),
            new Sale("Clothing", 50),
            new Sale("Electronics", 150),
            new Sale("Groceries", 70)
        };

        var report = GroupAndSummarizeSales(sales);

        foreach (var entry in report)
        {
            Console.WriteLine($"{entry.Key}: {entry.Value:C}");
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Output

Electronics: $250.00
Clothing: $50.00
Groceries: $70.00
Enter fullscreen mode Exit fullscreen mode

Step 2: The Problem With Single-Pass Data

Many real-world data sources support only single-pass access, meaning you cannot iterate through them more than once. Examples include:

  • Streams: Data read from sockets or files.
  • Expensive Queries: Database queries that are costly to repeat.

Let’s simulate a single-pass data source and see what happens.

Code Implementation

using System;
using System.Collections;
using System.Collections.Generic;

public class Sale
{
    public string Category { get; set; }
    public decimal Amount { get; set; }

    public Sale(string category, decimal amount)
    {
        Category = category;
        Amount = amount;
    }
}

public class SinglePassSequence<T> : IEnumerable<T>
{
    private IEnumerable<T> _data;
    private bool _hasBeenEnumerated = false;

    public SinglePassSequence(IEnumerable<T> data)
    {
        _data = data;
    }

    public IEnumerator<T> GetEnumerator()
    {
        if (_hasBeenEnumerated)
        {
            throw new InvalidOperationException("This sequence can only be iterated once.");
        }

        _hasBeenEnumerated = true;
        return _data.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

public class Program
{
    public static Dictionary<string, decimal> GroupAndSummarizeSales(IEnumerable<Sale> sales)
    {
        var categoryTotals = new Dictionary<string, decimal>();

        foreach (var sale in sales)
        {
            if (!categoryTotals.ContainsKey(sale.Category))
            {
                categoryTotals[sale.Category] = 0;
            }

            categoryTotals[sale.Category] += sale.Amount;
        }

        return categoryTotals;
    }

    public static void Main()
    {
        var sales = new SinglePassSequence<Sale>(
            new List<Sale>
            {
                new Sale("Electronics", 100),
                new Sale("Clothing", 50),
                new Sale("Electronics", 150),
                new Sale("Groceries", 70)
            });

        try
        {
            // This will throw an exception because the sequence cannot be iterated twice
            var report = GroupAndSummarizeSales(sales);
            foreach (var entry in report)
            {
                Console.WriteLine($"{entry.Key}: {entry.Value:C}");
            }
        }
        catch (InvalidOperationException ex)
        {
            Console.WriteLine($"Error: {ex.Message}");
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Output

Error: This sequence can only be iterated once.
Enter fullscreen mode Exit fullscreen mode

Step 3: The Solution – Using Collections

The solution is to store the data in a collection, such as a List, which allows multiple iterations. This ensures the data can be processed reliably without errors.

Code Implementation

using System;
using System.Collections.Generic;
using System.Linq;

public class Sale
{
    public string Category { get; set; }
    public decimal Amount { get; set; }

    public Sale(string category, decimal amount)
    {
        Category = category;
        Amount = amount;
    }
}

public class SinglePassSequence<T> : IEnumerable<T>
{
    private IEnumerable<T> _data;
    private bool _hasBeenEnumerated = false;

    public SinglePassSequence(IEnumerable<T> data)
    {
        _data = data;
    }

    public IEnumerator<T> GetEnumerator()
    {
        if (_hasBeenEnumerated)
        {
            throw new InvalidOperationException("This sequence can only be iterated once.");
        }

        _hasBeenEnumerated = true;
        return _data.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

public class Program
{
    public static Dictionary<string, decimal> GroupAndSummarizeSales(IEnumerable<Sale> sales)
    {
        var categoryTotals = new Dictionary<string, decimal>();

        foreach (var sale in sales)
        {
            if (!categoryTotals.ContainsKey(sale.Category))
            {
                categoryTotals[sale.Category] = 0;
            }

            categoryTotals[sale.Category] += sale.Amount;
        }

        return categoryTotals;
    }

    public static void Main()
    {
        var sales = new SinglePassSequence<Sale>(
            new List<Sale>
            {
                new Sale("Electronics", 100),
                new Sale("Clothing", 50),
                new Sale("Electronics", 150),
                new Sale("Groceries", 70)
            });

        // Store the data in a collection
        var salesList = sales.ToList();

        // Process the data
        var report = GroupAndSummarizeSales(salesList);

        foreach (var entry in report)
        {
            Console.WriteLine($"{entry.Key}: {entry.Value:C}");
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Output

Electronics: $250.00
Clothing: $50.00
Groceries: $70.00
Enter fullscreen mode Exit fullscreen mode

Lessons Learned

  1. Collections Solve Real-World Problems:

    • For single-pass data sources, collections enable caching and multiple iterations.
  2. Choosing the Right Collection:

    • Use List for ordered data.
    • Use Dictionary for key-value pairs.
  3. Efficiency:

    • Collections avoid redundant queries or expensive re-iterations.

Conclusion

Collections are indispensable for handling data reliably in programming. They ensure smooth processing, even for single-pass data sources, and allow for efficient operations. By incorporating collections, you make your applications robust and ready for real-world challenges.

Stay tuned for more on collection types and their best practices in upcoming articles! 🚀

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more