Discussion on: RAM vs CPU

View post

The usual place where I've run into memory trouble is when loading all data from file/database into memory. The two things I commonly do to avoid this are 1) add limits to database queries (e.g. LIMIT or TOP clause) or 2) use buffered / streaming reads.

#1 is the option I typically use first, because of the convenience of using 3rd party libraries to enumerate the data and convert it to objects.

For back-end business-y use, a lot of the data you load are singles or a small family of data that you want to convert to objects to make business decisions. So there's no danger of hitting a decent limit (E.g. 1000). When there is potentially a lot of data to be processed, you can load and process it in size-limited batches to avoid runaway memory usage.

For front-end use, many lists are potentially unbounded in size. I.e. The more invoices, the better. However, if my API would run out of memory to load it before passing it to the user, then their browser also should not be exposed to the memory load of displaying it all at once. Enforcing a limit on the number of returned records also entails adding paging (or infinite scroll) obviously, but also search functionality. Most users complain about paging because they want all the data on screen to Ctrl-F and find things. (If you do as they ask and display all the data in a large list, then they will complain about page load times.) So search usually must be implemented to fulfill that need while still limiting the number of records returned.

#2 is the option I use for potentially large amounts of data. Examples: export to CSV, process a file of unknown size. For these cases, I make sure that I use a method to load the data which streams/buffers the reads. Usually these involve a little more code, and are maybe harder to work with, but they limit your memory footprint to only the size of the buffer.