DEV Community

mohamed Tayel
mohamed Tayel

Posted on

1

Understanding the Need for Collections in Programming

Meta Description:Learn why collections are essential in programming through a practical sales report scenario. Understand how collections solve real-world problems, handle single-pass data sources, and enable efficient data processing with full code examples

Collections are not just convenient tools in programming; they are often essential for solving real-world problems efficiently. In this article, we’ll explore why collections are necessary using a sales report scenario. We’ll discuss how their absence can lead to errors and inefficiencies, and how using collections resolves these issues.


Scenario: Grouping and Summarizing Sales Data

Imagine you're tasked with generating a sales report. Each sale belongs to a category, and your goal is to:

  1. Group sales by category.
  2. Calculate the total sales for each category.

This seems straightforward, but if the input data comes from a source that can only be iterated once (e.g., a stream or database query), problems arise. Let’s walk through this scenario step by step.


Step 1: Initial Implementation

The task involves grouping sales by category and calculating totals. Here’s how we can approach it:

  1. Iterate through the sales data to group by category.
  2. Calculate the total sales for each group.

Code Implementation

using System;
using System.Collections.Generic;

public class Sale
{
    public string Category { get; set; }
    public decimal Amount { get; set; }

    public Sale(string category, decimal amount)
    {
        Category = category;
        Amount = amount;
    }
}

public class Program
{
    public static Dictionary<string, decimal> GroupAndSummarizeSales(IEnumerable<Sale> sales)
    {
        var categoryTotals = new Dictionary<string, decimal>();

        foreach (var sale in sales)
        {
            if (!categoryTotals.ContainsKey(sale.Category))
            {
                categoryTotals[sale.Category] = 0;
            }

            categoryTotals[sale.Category] += sale.Amount;
        }

        return categoryTotals;
    }

    public static void Main()
    {
        var sales = new List<Sale>
        {
            new Sale("Electronics", 100),
            new Sale("Clothing", 50),
            new Sale("Electronics", 150),
            new Sale("Groceries", 70)
        };

        var report = GroupAndSummarizeSales(sales);

        foreach (var entry in report)
        {
            Console.WriteLine($"{entry.Key}: {entry.Value:C}");
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Output

Electronics: $250.00
Clothing: $50.00
Groceries: $70.00
Enter fullscreen mode Exit fullscreen mode

Step 2: The Problem With Single-Pass Data

Many real-world data sources support only single-pass access, meaning you cannot iterate through them more than once. Examples include:

  • Streams: Data read from sockets or files.
  • Expensive Queries: Database queries that are costly to repeat.

Let’s simulate a single-pass data source and see what happens.

Code Implementation

using System;
using System.Collections;
using System.Collections.Generic;

public class Sale
{
    public string Category { get; set; }
    public decimal Amount { get; set; }

    public Sale(string category, decimal amount)
    {
        Category = category;
        Amount = amount;
    }
}

public class SinglePassSequence<T> : IEnumerable<T>
{
    private IEnumerable<T> _data;
    private bool _hasBeenEnumerated = false;

    public SinglePassSequence(IEnumerable<T> data)
    {
        _data = data;
    }

    public IEnumerator<T> GetEnumerator()
    {
        if (_hasBeenEnumerated)
        {
            throw new InvalidOperationException("This sequence can only be iterated once.");
        }

        _hasBeenEnumerated = true;
        return _data.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

public class Program
{
    public static Dictionary<string, decimal> GroupAndSummarizeSales(IEnumerable<Sale> sales)
    {
        var categoryTotals = new Dictionary<string, decimal>();

        foreach (var sale in sales)
        {
            if (!categoryTotals.ContainsKey(sale.Category))
            {
                categoryTotals[sale.Category] = 0;
            }

            categoryTotals[sale.Category] += sale.Amount;
        }

        return categoryTotals;
    }

    public static void Main()
    {
        var sales = new SinglePassSequence<Sale>(
            new List<Sale>
            {
                new Sale("Electronics", 100),
                new Sale("Clothing", 50),
                new Sale("Electronics", 150),
                new Sale("Groceries", 70)
            });

        try
        {
            // This will throw an exception because the sequence cannot be iterated twice
            var report = GroupAndSummarizeSales(sales);
            foreach (var entry in report)
            {
                Console.WriteLine($"{entry.Key}: {entry.Value:C}");
            }
        }
        catch (InvalidOperationException ex)
        {
            Console.WriteLine($"Error: {ex.Message}");
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Output

Error: This sequence can only be iterated once.
Enter fullscreen mode Exit fullscreen mode

Step 3: The Solution – Using Collections

The solution is to store the data in a collection, such as a List, which allows multiple iterations. This ensures the data can be processed reliably without errors.

Code Implementation

using System;
using System.Collections.Generic;
using System.Linq;

public class Sale
{
    public string Category { get; set; }
    public decimal Amount { get; set; }

    public Sale(string category, decimal amount)
    {
        Category = category;
        Amount = amount;
    }
}

public class SinglePassSequence<T> : IEnumerable<T>
{
    private IEnumerable<T> _data;
    private bool _hasBeenEnumerated = false;

    public SinglePassSequence(IEnumerable<T> data)
    {
        _data = data;
    }

    public IEnumerator<T> GetEnumerator()
    {
        if (_hasBeenEnumerated)
        {
            throw new InvalidOperationException("This sequence can only be iterated once.");
        }

        _hasBeenEnumerated = true;
        return _data.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

public class Program
{
    public static Dictionary<string, decimal> GroupAndSummarizeSales(IEnumerable<Sale> sales)
    {
        var categoryTotals = new Dictionary<string, decimal>();

        foreach (var sale in sales)
        {
            if (!categoryTotals.ContainsKey(sale.Category))
            {
                categoryTotals[sale.Category] = 0;
            }

            categoryTotals[sale.Category] += sale.Amount;
        }

        return categoryTotals;
    }

    public static void Main()
    {
        var sales = new SinglePassSequence<Sale>(
            new List<Sale>
            {
                new Sale("Electronics", 100),
                new Sale("Clothing", 50),
                new Sale("Electronics", 150),
                new Sale("Groceries", 70)
            });

        // Store the data in a collection
        var salesList = sales.ToList();

        // Process the data
        var report = GroupAndSummarizeSales(salesList);

        foreach (var entry in report)
        {
            Console.WriteLine($"{entry.Key}: {entry.Value:C}");
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Output

Electronics: $250.00
Clothing: $50.00
Groceries: $70.00
Enter fullscreen mode Exit fullscreen mode

Lessons Learned

  1. Collections Solve Real-World Problems:

    • For single-pass data sources, collections enable caching and multiple iterations.
  2. Choosing the Right Collection:

    • Use List for ordered data.
    • Use Dictionary for key-value pairs.
  3. Efficiency:

    • Collections avoid redundant queries or expensive re-iterations.

Conclusion

Collections are indispensable for handling data reliably in programming. They ensure smooth processing, even for single-pass data sources, and allow for efficient operations. By incorporating collections, you make your applications robust and ready for real-world challenges.

Stay tuned for more on collection types and their best practices in upcoming articles! 🚀

Speedy emails, satisfied customers

Postmark Image

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Explore a sea of insights with this enlightening post, highly esteemed within the nurturing DEV Community. Coders of all stripes are invited to participate and contribute to our shared knowledge.

Expressing gratitude with a simple "thank you" can make a big impact. Leave your thanks in the comments!

On DEV, exchanging ideas smooths our way and strengthens our community bonds. Found this useful? A quick note of thanks to the author can mean a lot.

Okay