The point I was making there was that LINQ methods to not load or run anything when they are declared (a common mistaken assumption) -- they only "do something" when they are enumerated.
The main issue from the original solution was run-away memory usage, because all the rows are loaded into memory at once. Then a new intermediate object is allocated at each step for every row. So memory usage is roughly: number of rows * (row size + intermediate objects sizes)
Using LINQ as I mentioned, only one row is fully processed at a time before fetching the next row. So at most you have one set of allocations for the row and each intermediate object. So memory usage is roughly: row size + intermediate objects sizes.
Any solution processing files would probably also do well to buffer the results to an IO output, to avoid collecting large result sets in memory.
If Garbage Collector performance is an issue, that can be optimized separately. Common strategies are: value types (allocated on stack frame and copied when passed in or returned to other stack frames), or a pre-allocated object pool, or if you need the same consistent set of objects for each row, then a set of singleton objects is equivalent to an object pool of size 1... just remember to reset them between iterations.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
The point I was making there was that LINQ methods to not load or run anything when they are declared (a common mistaken assumption) -- they only "do something" when they are enumerated.
The main issue from the original solution was run-away memory usage, because all the rows are loaded into memory at once. Then a new intermediate object is allocated at each step for every row. So memory usage is roughly: number of rows * (row size + intermediate objects sizes)
Using LINQ as I mentioned, only one row is fully processed at a time before fetching the next row. So at most you have one set of allocations for the row and each intermediate object. So memory usage is roughly: row size + intermediate objects sizes.
Any solution processing files would probably also do well to buffer the results to an IO output, to avoid collecting large result sets in memory.
If Garbage Collector performance is an issue, that can be optimized separately. Common strategies are: value types (allocated on stack frame and copied when passed in or returned to other stack frames), or a pre-allocated object pool, or if you need the same consistent set of objects for each row, then a set of singleton objects is equivalent to an object pool of size 1... just remember to reset them between iterations.