DEV Community

mohamed Tayel
mohamed Tayel

Posted on

1

Understanding the Need for Collections in Programming

Meta Description:Learn why collections are essential in programming through a practical sales report scenario. Understand how collections solve real-world problems, handle single-pass data sources, and enable efficient data processing with full code examples

Collections are not just convenient tools in programming; they are often essential for solving real-world problems efficiently. In this article, we’ll explore why collections are necessary using a sales report scenario. We’ll discuss how their absence can lead to errors and inefficiencies, and how using collections resolves these issues.


Scenario: Grouping and Summarizing Sales Data

Imagine you're tasked with generating a sales report. Each sale belongs to a category, and your goal is to:

  1. Group sales by category.
  2. Calculate the total sales for each category.

This seems straightforward, but if the input data comes from a source that can only be iterated once (e.g., a stream or database query), problems arise. Let’s walk through this scenario step by step.


Step 1: Initial Implementation

The task involves grouping sales by category and calculating totals. Here’s how we can approach it:

  1. Iterate through the sales data to group by category.
  2. Calculate the total sales for each group.

Code Implementation

using System;
using System.Collections.Generic;

public class Sale
{
    public string Category { get; set; }
    public decimal Amount { get; set; }

    public Sale(string category, decimal amount)
    {
        Category = category;
        Amount = amount;
    }
}

public class Program
{
    public static Dictionary<string, decimal> GroupAndSummarizeSales(IEnumerable<Sale> sales)
    {
        var categoryTotals = new Dictionary<string, decimal>();

        foreach (var sale in sales)
        {
            if (!categoryTotals.ContainsKey(sale.Category))
            {
                categoryTotals[sale.Category] = 0;
            }

            categoryTotals[sale.Category] += sale.Amount;
        }

        return categoryTotals;
    }

    public static void Main()
    {
        var sales = new List<Sale>
        {
            new Sale("Electronics", 100),
            new Sale("Clothing", 50),
            new Sale("Electronics", 150),
            new Sale("Groceries", 70)
        };

        var report = GroupAndSummarizeSales(sales);

        foreach (var entry in report)
        {
            Console.WriteLine($"{entry.Key}: {entry.Value:C}");
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Output

Electronics: $250.00
Clothing: $50.00
Groceries: $70.00
Enter fullscreen mode Exit fullscreen mode

Step 2: The Problem With Single-Pass Data

Many real-world data sources support only single-pass access, meaning you cannot iterate through them more than once. Examples include:

  • Streams: Data read from sockets or files.
  • Expensive Queries: Database queries that are costly to repeat.

Let’s simulate a single-pass data source and see what happens.

Code Implementation

using System;
using System.Collections;
using System.Collections.Generic;

public class Sale
{
    public string Category { get; set; }
    public decimal Amount { get; set; }

    public Sale(string category, decimal amount)
    {
        Category = category;
        Amount = amount;
    }
}

public class SinglePassSequence<T> : IEnumerable<T>
{
    private IEnumerable<T> _data;
    private bool _hasBeenEnumerated = false;

    public SinglePassSequence(IEnumerable<T> data)
    {
        _data = data;
    }

    public IEnumerator<T> GetEnumerator()
    {
        if (_hasBeenEnumerated)
        {
            throw new InvalidOperationException("This sequence can only be iterated once.");
        }

        _hasBeenEnumerated = true;
        return _data.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

public class Program
{
    public static Dictionary<string, decimal> GroupAndSummarizeSales(IEnumerable<Sale> sales)
    {
        var categoryTotals = new Dictionary<string, decimal>();

        foreach (var sale in sales)
        {
            if (!categoryTotals.ContainsKey(sale.Category))
            {
                categoryTotals[sale.Category] = 0;
            }

            categoryTotals[sale.Category] += sale.Amount;
        }

        return categoryTotals;
    }

    public static void Main()
    {
        var sales = new SinglePassSequence<Sale>(
            new List<Sale>
            {
                new Sale("Electronics", 100),
                new Sale("Clothing", 50),
                new Sale("Electronics", 150),
                new Sale("Groceries", 70)
            });

        try
        {
            // This will throw an exception because the sequence cannot be iterated twice
            var report = GroupAndSummarizeSales(sales);
            foreach (var entry in report)
            {
                Console.WriteLine($"{entry.Key}: {entry.Value:C}");
            }
        }
        catch (InvalidOperationException ex)
        {
            Console.WriteLine($"Error: {ex.Message}");
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Output

Error: This sequence can only be iterated once.
Enter fullscreen mode Exit fullscreen mode

Step 3: The Solution – Using Collections

The solution is to store the data in a collection, such as a List, which allows multiple iterations. This ensures the data can be processed reliably without errors.

Code Implementation

using System;
using System.Collections.Generic;
using System.Linq;

public class Sale
{
    public string Category { get; set; }
    public decimal Amount { get; set; }

    public Sale(string category, decimal amount)
    {
        Category = category;
        Amount = amount;
    }
}

public class SinglePassSequence<T> : IEnumerable<T>
{
    private IEnumerable<T> _data;
    private bool _hasBeenEnumerated = false;

    public SinglePassSequence(IEnumerable<T> data)
    {
        _data = data;
    }

    public IEnumerator<T> GetEnumerator()
    {
        if (_hasBeenEnumerated)
        {
            throw new InvalidOperationException("This sequence can only be iterated once.");
        }

        _hasBeenEnumerated = true;
        return _data.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

public class Program
{
    public static Dictionary<string, decimal> GroupAndSummarizeSales(IEnumerable<Sale> sales)
    {
        var categoryTotals = new Dictionary<string, decimal>();

        foreach (var sale in sales)
        {
            if (!categoryTotals.ContainsKey(sale.Category))
            {
                categoryTotals[sale.Category] = 0;
            }

            categoryTotals[sale.Category] += sale.Amount;
        }

        return categoryTotals;
    }

    public static void Main()
    {
        var sales = new SinglePassSequence<Sale>(
            new List<Sale>
            {
                new Sale("Electronics", 100),
                new Sale("Clothing", 50),
                new Sale("Electronics", 150),
                new Sale("Groceries", 70)
            });

        // Store the data in a collection
        var salesList = sales.ToList();

        // Process the data
        var report = GroupAndSummarizeSales(salesList);

        foreach (var entry in report)
        {
            Console.WriteLine($"{entry.Key}: {entry.Value:C}");
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Output

Electronics: $250.00
Clothing: $50.00
Groceries: $70.00
Enter fullscreen mode Exit fullscreen mode

Lessons Learned

  1. Collections Solve Real-World Problems:

    • For single-pass data sources, collections enable caching and multiple iterations.
  2. Choosing the Right Collection:

    • Use List for ordered data.
    • Use Dictionary for key-value pairs.
  3. Efficiency:

    • Collections avoid redundant queries or expensive re-iterations.

Conclusion

Collections are indispensable for handling data reliably in programming. They ensure smooth processing, even for single-pass data sources, and allow for efficient operations. By incorporating collections, you make your applications robust and ready for real-world challenges.

Stay tuned for more on collection types and their best practices in upcoming articles! 🚀

Speedy emails, satisfied customers

Postmark Image

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay