loading...

RAM vs CPU

engineercoding profile image Wesley Ameling ・1 min read

Designing your program needs a lot of thought, and big aspect is whether to choose between RAM and CPU. You don't actually have to choose which one you are going to use the most, you need to pick the best balance between those. It is the old question about speed vs efficiency.

I actually experienced why this is so important, because you simple want your RAM to be stable. Sometimes this gets overlooked while testing, as usually small files/queries are used in a testing environment. Once it is deployed on production with actual production data, one either gets OutOfMemory errors or is using nearly all the RAM.

For myself I use the following questions to keep a stable RAM:

  • Do I need the complete file to be read in memory or can I process it line by line/ word by word?
  • Could reading this file in memory result in exponential grow? (e.g. when an undetermined amount of files need to be read)
  • Do I want to use this data later on?

I once wrote an application which used a 3 mb database and decided for some reason that I had to read it completely for each iteration. In that case I better could have loaded the whole database into RAM and use it over and over again, in stead of reading it over and over again. This resulted in an actual performance gain!

How do you manage to maintain a stable RAM?

Posted on by:

engineercoding profile

Wesley Ameling

@engineercoding

A computer enthusiast who is currently studying bioinformatics.

Discussion

pic
Editor guide
 

The usual place where I've run into memory trouble is when loading all data from file/database into memory. The two things I commonly do to avoid this are 1) add limits to database queries (e.g. LIMIT or TOP clause) or 2) use buffered / streaming reads.

#1 is the option I typically use first, because of the convenience of using 3rd party libraries to enumerate the data and convert it to objects.

For back-end business-y use, a lot of the data you load are singles or a small family of data that you want to convert to objects to make business decisions. So there's no danger of hitting a decent limit (E.g. 1000). When there is potentially a lot of data to be processed, you can load and process it in size-limited batches to avoid runaway memory usage.

For front-end use, many lists are potentially unbounded in size. I.e. The more invoices, the better. However, if my API would run out of memory to load it before passing it to the user, then their browser also should not be exposed to the memory load of displaying it all at once. Enforcing a limit on the number of returned records also entails adding paging (or infinite scroll) obviously, but also search functionality. Most users complain about paging because they want all the data on screen to Ctrl-F and find things. (If you do as they ask and display all the data in a large list, then they will complain about page load times.) So search usually must be implemented to fulfill that need while still limiting the number of records returned.

#2 is the option I use for potentially large amounts of data. Examples: export to CSV, process a file of unknown size. For these cases, I make sure that I use a method to load the data which streams/buffers the reads. Usually these involve a little more code, and are maybe harder to work with, but they limit your memory footprint to only the size of the buffer.