DEV Community

Cover image for 🚀 Demystifying memory management in modern programming languages

🚀 Demystifying memory management in modern programming languages

Deepu K Sasidharan on January 08, 2020

Originally published at deepu.tech. In this multi-part series, I aim to demystify the concepts behind memory management and take a deeper look at ...
Collapse
 
sergio0694 profile image
Sergio Pedri

Nice article! 😊

I'll just throw in a couple feedbacks:

  1. What you say about the stack being faster than the heap is not really true. In both cases you're doing a memory access, it doesn't matter whether the target address is read from %esp or from some other location. The only speed difference there could be based on whether the target memory page is cached or not, but the stack is not any faster per se than the heap. And in both case there's still the same exact "lookup" (dereference).
  2. The stack is surely finite in size and more limited, but the size of the stack at any given point doesn't need to be constant at all. Programs can allocate arbitrary temporary buffer on the stack just fine. After all, all you're really doing is to just add/subtract an offset to %esp.
  3. When you say you can read values from the top of the stack, I'd also add the fact that functions actually read any value from the local stack frame at any offset. The stack is not literally a FIFO: you can both push values as well as reading/writing to any of the existing values. Again, it's just a read/write to %esp + offset.
  4. Same thing about the heap being slower, like in point 1). I think you should clarify that what's slow is the allocation process (and the GC tracking on managed languages), but the actual dereferencing is just as fast as the stack.
  5. I'm not sure the phrasing is optimal when you say you can have "out of memory errors if your application tries to use more memory than the allocated heap". Out of memory exceptions errors arise when you have not enough space in the current addressing space to perform an operation, before actually allocating memory on the heap. For instance, if you're trying to allocate a block of a size such that there is not a large enough address range in the address space. But it's not like the heap is "allocated" from the start. It's just a segment in the virtual address space of a process. It might very well be possible that the same operation that failed could have instead been completed successfully if the allocated pages in the heap had been compacted beforehand.

I want to repeat it, the article is very well written and you did a great job, I'm just giving you my 2 cents here about a couple of details I noticed.

Cheers!

Collapse
 
havarem profile image
André Jacques • Edited

The statement about the stack not being faster than the heap is only in the fact that at the moment you have the address, the time to read/write is the same. The problem is that semantically, the stack addresses are known (through the Stack Pointer and Frame Pointer as well). As the author said, there is no such thing as a Heap Pointer (it just doesn't make sense). In that regard, let's say we have 2 integers in C like this:

int a = 3;
int *b;
b = malloc(sizeof(int));
*b = 4;

First of all, it is clear that creating an element in the stack is way faster than the heap since malloc needs to find a place to put the continuous 32-bits (or 64-bits) free memory. This is because the stack just needs to look up the Stack pointer, add the element and move the stack pointer accordingly.

After that, modifying an element in the stack is easy since it is basically an offset of the Stack pointer. But for the heap, first, you need to find the pointer in the stack, from that pointer you get the address in the heap and then you can read or write at that address.

And I didn't talk about the difference between reading address by offset vs full address since a lot of CPU have instruction sets that are too small to handle the full address in one instruction.

In conclusion, Stack is faster than Heap only because of the Stack Pointer.

Collapse
 
sergio0694 profile image
Sergio Pedri • Edited

I believe there has been a misunderstanding here. If you go back and read my original comment, I've made it clear that when I said the stack was equally as fast as the heap, it was in direct read/write operations to a memory location in either of those two segments. You can't compare a read/write to a local value type variable to one to a reference variable, for obvious reasons - those are handled differently. And you can't take the overhead of a malloc call into consideration, or the garbage collection cost, that was completely besides my point, and I did mention that in my original message.

What I said was that reading or writing to a specific memory location has always the same performance, regardless of whether that memory location is situation, be it in the stack or the heap.

As a practical example. Suppose you allocate a buffer of int values on the stack, and one on the heap, and end up with two int* pointers. One points to the buffer in the stack, one in the heap. If you read/write to any of those two pointers, the operation will be exactly the same down to the assembly level, and the performance will be exactly the same too.

As another example, suppose you have a local value type variable, and you pass it by reference to another function. That function will have no way of knowing whether that reference points to a variable in some other stack frame, or to a memory location in the heap, and that makes absolutely no difference at all. Again, both cases are compiled to exactly the same instructions, it's just a pointer of some type anyway.

In conclusion, the speed of an operation is not intrinsically linked to where a value is located in the virtual memory address space, but just to how you're accessing that location.

I hope this clears things up regarding my original point on that 👍

EDIT: just to clarify, you do have a pointer lookup in both cases though. If you have a pointer to a heap location eg. in %ecx and want to read that value, you'll do movl eax, [ecx]. If you want to read the value at the top of the stack, you'll do movl eax, [esp]. In both cases the operation is exactly the same and you do have an address lookup just the same. So you saying that you never look addresses up when accessing the stack is just not the case. You're correct in saying that if you have a pointer on the heap stored on the stack you need two dereferences there, sure. But even in the best case scenario on the stack, you still need at least one anyway. And if you have a pointer in a given register, reading from that location has exactly the same cost no matter if it points to somewhere in the heap or the stack.

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

Your last edit is exactly the point, from a language point we care about cost of operation when we talk about speed, and the cost of operation for the stack is lower than heap and that is why we say stack is faster than the heap. The actual speed of writing/reading from any part of memory indeed is the same but for heap, there are more operations involved than stack when writing/reading value making the cost of operation higher for the heap.

Thread Thread
 
havarem profile image
André Jacques
Suppose you allocate a buffer of int values on the stack, and one on the heap, and end up with two int* pointers.

That is not true, since in the stack everything is offset of the stack pointer that is in the CPU registers. So, for example in ARM CPUs, it will take something like 3 instructions to retrieve any of the int in the buffer sitting in the stack. On the situation of the one on the heap, the first element will take something like 4 instructions since you need to set a register with the address you will find on the stack (the pointer of your heap buffer), and AFTER the other elements will be read AS FAST AS the stack since the other int element of the buffer are an offset of that value. And like I said earlier, if you have the address as a label, you will need 2 instructions to set your register before LDR/STR that memory block.

I think what you are trying to say is that at the time you have the address loaded in a register, the read time will be the same, which is true. The danger is that people might believe it is always the best solution to use the heap since it is so large, but doesn't account for the way it is handled. Embedded systems are very sensitive to this considering low frequency (very so often below 100 MHz), no OS and small memory footprint. The good practice, in that case, is only to use the heap with object allocated at startup so there is NO malloc anywhere, making sure the system stays deterministic. I know that in a 12-core i9 CPU clocks at 4GHz, the impact on most of the system people work (except OS for that matter, or video games) will be so negligible.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

This is what I have been trying to say, but I think you said it better :)

Collapse
 
juancarlospaco profile image
Juan Carlos • Edited

Add Nim to the series, has all those, including a Rust-like memory management, including Manual memory management too, has like +6 GC, you can configure them, and more.
Will give you a lot to write about. :)

Collapse
 
deepu105 profile image
Deepu K Sasidharan • Edited

Thanks for the comment. However, Nim does reference counting and GC, I don't think it's Rust like

Collapse
 
coolshaurya profile image
Shaurya

Nim has 6 GCs - The default one is a reference counting one. See the nim documentation for more details

Thread Thread
 
juancarlospaco profile image
Juan Carlos

Nim calls all of them "GC" to keep things simple for new users,
but Nim memory management strategies may or may not fit the "Garbage Collector" definition whatsoever.

Ive meet people that considers whatever Rust uses a Garbage Collector too.
🤷‍♀️

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan • Edited

Thanks and Yes, I'm aware. They did go all the way on the GC part 😜

Technically they all fit under definition of GC IMO. They don't have ARC or Ownership as far as I see

Thread Thread
 
juancarlospaco profile image
Juan Carlos

--gc:arc

flat reference counting with move semantics and destructors with single ownership,
optimized with sink and lent annotations for zero-copy all the way is annotated,
basically it is like a shared heap with subgraphs with a single owner,
all this is done at compile-time, and ownership can be checked at run-time too.

Not really a GC despite --gc:.

Not really the same as Swift and ObjectiveC lang ARC because those can not handle cycles.

Not really Atomic despite ARC,
it wont need atomic reference counting,
a graph is always owned by a single owner,
then no need to make the reference counting atomic
(real atomic is known to be kinda slow).

Collapse
 
declantreanor profile image
declanTreanor

That was an excellent, and comprehensive summary of memory management. I look forward to future installments, starting with the already-there part-2. By way of a contribution; because I am unable to correct anything technical about the JVM, I've spotted a grammatical problem. Namely:

"..load any run-time systems that is required"
, should be:
"..load any run-time systems that are required"

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Thank you I'll update that. Glad you liked it

Collapse
 
mathewgrabau profile image
Mathew Grabau

Thank you for writing the article. I enjoyed it and it is a very relevant topic.

I saw C# recommended in the comments - as a .NET developer since .NET 2.0 was new, I agree. I've noticed that C# has gained popularity, but I also know how relevant this topic can be in developing larger applications or chasing performance.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I'll add C# since this is the second request now :)

Collapse
 
ram535 profile image
aldama • Edited

It good be interesting to know how Dart manage memory. I think a lot of people would be interested especially since flutter in getting a lot of momentum.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

AFAIK, Dart uses a mark & sweep GC similar to JVM & Go. I'll see if I can add more details in an upcoming chapter

Collapse
 
ksceriath profile image
ksceriath

The Rust example (in the graphic) is not a compiler error (for quiet sometime now). Compiler is smart enough to understand that "y isn't using it anymore".

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I know, but the image is not that important here. I'll use correct representations for the Rust part coming next

Collapse
 
ksceriath profile image
ksceriath

Cool! No worries. (Though I'm not sure why you'd want to keep an incorrect representation.)

Collapse
 
havarem profile image
André Jacques

Just a little thing to add about heap memory: overtime, the heap become non-deterministic since the search for free memory takes more time. It has more impact on embedded system development though.

Great job.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Thank you and that makes sense.

Collapse
 
shaijut profile image
Shaiju T • Edited

Good one 😄 Do you have any plans to add Memory management in C# for future series ?

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I was thinking of adding that, I might once I'm done with the current ones planned

Collapse
 
Sloan, the sloth mascot
Comment deleted
Collapse
 
deepu105 profile image
Deepu K Sasidharan

I guess you posted here by mistake? if not care to explain what you mean?

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Well, no worries. I just didn't understand what you meant

Collapse
 
collegeimprovements profile image
arpit

Please add Elixir/Erlang comparisons as well if possible!

Collapse
 
deepu105 profile image
Deepu K Sasidharan • Edited

Unfortunately, I have never used those languages and also I wanted to do a language per type of memory management and I think almost everything except ARC is covered in my plan

Collapse
 
jkleiser profile image
Jon Kleiser

Maybe the Pony language to some extent can be compared to Rust re. memory management. I suggest you take a look. You could e.g. start here: tutorial.ponylang.io/reference-cap...

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Interesting. Thanks for the comment

Collapse
 
nourineka profile image
Nourin-Eka

Thank you very much for listing all the references. Certainly makes us feel confident on sharing it with colleagues and friends. Very well written indeed! Thanks for the effort.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

You are welcome and thank you

Collapse
 
vladion profile image
Vlad Ion

Just wanted to mention that the link for part 2 at the end of the article is not correct.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Thank you so much. Just corrected it

Collapse
 
artoodeeto profile image
aRtoo

yow thank you for this. much love to you sir!

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Thank you

Collapse
 
anhoangphuc profile image
anhoangphuc

It's very attractive. Wait for your next post.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Thank you

Collapse
 
hasii2011 profile image
Humberto A Sanchez II

Looking forward to the subsequent articles in the series

Collapse
 
iampaoloxd profile image
Paolo

thanks for this waiting for the v8 part :)

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Thank you. I hope to get to it soon

Collapse
 
deepu105 profile image
Deepu K Sasidharan
Collapse
 
elazar_18 profile image
Matthew Turland • Edited

I believe PHP stopped altered its reference counting approach in version 7.

nikic.github.io/2015/05/05/Interna...

Collapse
 
deepu105 profile image
Deepu K Sasidharan • Edited

Yes indeed, PHP 5.3 onwards is now using a tracing reference counting GC I believe.

Collapse
 
deepu105 profile image
Deepu K Sasidharan
Collapse
 
deepu105 profile image
Deepu K Sasidharan
Collapse
 
deepu105 profile image
Deepu K Sasidharan
Collapse
 
jakesilicon profile image
Jake

That's the best article that covers all essential parts about memory management. I love it.