Deepu K Sasidharan

Posted on Jan 8, 2020 • Edited on Feb 19, 2024 • Originally published at deepu.tech

🚀 Demystifying memory management in modern programming languages

#languages #programming #computerscience

Originally published at deepu.tech.

In this multi-part series, I aim to demystify the concepts behind memory management and take a deeper look at memory management in some of the modern programming languages. I hope the series would give you some insights into what is happening under the hood of these languages in terms of memory management. Learning about memory management will also help us to write more performant code as the way we write code also has an impact on memory management regardless of the automatic memory management technique used by the language.

Part 1: Introduction to Memory management

Memory management is the process of controlling and coordinating the way a software application access computer memory. It is a serious topic in software engineering and its a topic that confuses some people and is a black box for some.

What is it?

When a software runs on a target Operating system on a computer it needs access to the computers RAM(Random-access memory) to:

load its own bytecode that needs to be executed
store the data values and data structures used by the program that is executed
load any run-time systems that are required for the program to execute

When a software program uses memory there are two regions of memory they use, apart from the space used to load the bytecode, Stack and Heap memory.

Stack

The stack is used for static memory allocation and as the name suggests it is a last in first out(LIFO) stack (Think of it as a stack of boxes).

Due to this nature, the process of storing and retrieving data from the stack is very fast as there is no lookup required, you just store and retrieve data from the topmost block on it.
But this means any data that is stored on the stack has to be finite and static(The size of the data is known at compile-time).
This is where the execution data of the functions are stored as stack frames(So, this is the actual execution stack). Each frame is a block of space where the data required for that function is stored. For example, every time a function declares a new variable, it is "pushed" onto the topmost block in the stack. Then every time a function exits, the topmost block is cleared, thus all of the variables pushed onto the stack by that function, are cleared. These can be determined at compile time due to the static nature of the data stored here.
Multi-threaded applications can have a stack per thread.
Memory management of the stack is simple and straightforward and is done by the OS.
Typical data that are stored on stack are local variables(value types or primitives, primitive constants), pointers and function frames.
This is where you would encounter stack overflow errors as the size of the stack is limited compared to the Heap.
There is a limit on the size of value that can be stored on the Stack for most languages.

Stack used in JavaScript, objects are stored in Heap and referenced when needed. Here is a video of the same.

Heap

Heap is used for dynamic memory allocation and unlike stack, the program needs to look up the data in heap using pointers (Think of it as a big multi-level library).

It is slower than stack as the process of looking up data is more involved but it can store more data than the stack.
This means data with dynamic size can be stored here.
Heap is shared among threads of an application.
Due to its dynamic nature heap is trickier to manage and this is where most of the memory management issues arise from and this is where the automatic memory management solutions from the language kick in.
Typical data that are stored on the heap are global variables, reference types like objects, strings, maps, and other complex data structures.
This is where you would encounter out of memory errors if your application tries to use more memory than the allocated heap(Though there are many other factors at play here like GC, compacting).
Generally, there is no limit on the size of the value that can be stored on the heap. Of course, there is the upper limit of how much memory is allocated to the application.

Why is it important?

Unlike Hard disk drives, RAM is not infinite. If a program keeps on consuming memory without freeing it, ultimately it will run out of memory and crash itself or even worse crash the operating system. Hence software programs can't just keep using RAM as they like as it will cause other programs and processes to run out of memory. So instead of letting the software developer figure this out, most programming languages provide ways to do automatic memory management. And when we talk about memory management we are mostly talking about managing the Heap memory.

Different approaches?

Since modern programming languages don't want to burden(more like trust 👅) the end developer to manage the memory of his/her application most of them have devised a way to do automatic memory management. Some older languages still require manual memory handling but many do provide neat ways to do that. Some languages use multiple approaches to memory management and some even let the developer choose what is best for him/her(C++ is a good example). The approaches can be categorized as below

Manual memory management

The language doesn't manage memory for you by default, it's up to you to allocate and free memory for the objects you create. For example, C and C++. They provide the malloc, realloc, calloc, and free methods to manage memory and it's up to the developer to allocate and free heap memory in the program and make use of pointers efficiently to manage memory. Let's just say that it's not for everyone 😉.

Garbage collection(GC)

Automatic management of heap memory by freeing unused memory allocations. GC is one of the most common memory management in modern languages and the process often runs at certain intervals and thus might cause a minor overhead called pause times. JVM(Java/Scala/Groovy/Kotlin), JavaScript, C#, Golang, OCaml, and Ruby are some of the languages that use Garbage collection for memory management by default.

Mark & Sweep GC: Also known as Tracing GC. Its generally a two-phase algorithm that first marks objects that are still being referenced as "alive" and in the next phase frees the memory of objects that are not alive. JVM, C#, Ruby, JavaScript, and Golang employ this approach for example. In JVM there are different GC algorithms to choose from while JavaScript engines like V8 use a Mark & Sweep GC along with Reference counting GC to complement it. This kind of GC is also available for C & C++ as an external library.
Reference counting GC: In this approach, every object gets a reference count which is incremented or decremented as references to it change and garbage collection is done when the count becomes zero. It's not very preferred as it cannot handle cyclic references. PHP, Perl, and Python, for example, uses this type of GC with workarounds to overcome cyclic references. This type of GC can be enabled for C++ as well.

Resource Acquisition is Initialization (RAII)

In this type of memory management, an object's memory allocation is tied to its lifetime, which is from construction until destruction. It was introduced in C++ and is also used by Ada and Rust.

Automatic Reference Counting(ARC)

It's similar to Reference counting GC but instead of running a runtime process at a specific interval the retain and release instructions are inserted to the compiled code at compile-time and when an object reference becomes zero its cleared automatically as part of execution without any program pause. It also cannot handle cyclic references and relies on the developer to handle that by using certain keywords. Its a feature of the Clang compiler and provides ARC for Objective C & Swift.

Ownership

It combines RAII with an ownership model, any value must have a variable as its owner(and only one owner at a time) when the owner goes out of scope the value will be dropped freeing the memory regardless of it being in stack or heap memory. It is kind of like Compile-time reference counting. It is used by Rust, in my research I couldn't find any other language using this exact mechanism.

We have just scratched the surface of memory management. Each programming language uses its own version of these and employs different algorithms tuned for different goals. In the next parts of the series, we will take a closer look at the exact memory management solution in some of the popular languages.

Stay tuned for upcoming parts of this series:

Part 2: Memory management in JVM(Java, Kotlin, Scala, Groovy)
Part 3: Memory management in V8(JavaScript/WebAssembly)
Part 4: Memory management in Go
Part 5: Memory management in Rust
Part 6: Memory management in Python

References

If you like this article, please leave a like or a comment.

You can follow me on Twitter and LinkedIn.

Image credits:
Stack visualization: Created based on pythontutor.
Ownership illustration: Link Clark, The Rust team under Creative Commons Attribution Share-Alike License v3.0.

Oldest comments (56)

Comment deleted

Deepu K Sasidharan • Jan 14 '20

I guess you posted here by mistake? if not care to explain what you mean?

Deepu K Sasidharan • Jan 14 '20

Well, no worries. I just didn't understand what you meant

aRtoo • Jan 9 '20

yow thank you for this. much love to you sir!

Deepu K Sasidharan • Jan 9 '20

Thank you

anhoangphuc • Jan 9 '20

It's very attractive. Wait for your next post.

Deepu K Sasidharan • Jan 9 '20

Thank you

Juan Carlos • Jan 9 '20 • Edited

Add Nim to the series, has all those, including a Rust-like memory management, including Manual memory management too, has like +6 GC, you can configure them, and more.
Will give you a lot to write about. :)

Deepu K Sasidharan • Jan 9 '20 • Edited

Thanks for the comment. However, Nim does reference counting and GC, I don't think it's Rust like

Shaurya • Jan 14 '20

Nim has 6 GCs - The default one is a reference counting one. See the nim documentation for more details

Juan Carlos • Jan 14 '20

Nim calls all of them "GC" to keep things simple for new users,
but Nim memory management strategies may or may not fit the "Garbage Collector" definition whatsoever.

Ive meet people that considers whatever Rust uses a Garbage Collector too.
🤷‍♀️

Deepu K Sasidharan • Jan 14 '20 • Edited

Thanks and Yes, I'm aware. They did go all the way on the GC part 😜

Technically they all fit under definition of GC IMO. They don't have ARC or Ownership as far as I see

Juan Carlos • Jan 14 '20

--gc:arc

flat reference counting with move semantics and destructors with single ownership,
optimized with sink and lent annotations for zero-copy all the way is annotated,
basically it is like a shared heap with subgraphs with a single owner,
all this is done at compile-time, and ownership can be checked at run-time too.

Not really a GC despite --gc:.

Not really the same as Swift and ObjectiveC lang ARC because those can not handle cycles.

Not really Atomic despite ARC,
it wont need atomic reference counting,
a graph is always owned by a single owner,
then no need to make the reference counting atomic
(real atomic is known to be kinda slow).

Deepu K Sasidharan • Jan 14 '20 • Edited

Well, in that case, it's just bad grouping coz ARC is not GC. And seems these are buried in the docs somewhere.

arpit • Jan 9 '20

Please add Elixir/Erlang comparisons as well if possible!

Deepu K Sasidharan • Jan 9 '20 • Edited

Unfortunately, I have never used those languages and also I wanted to do a language per type of memory management and I think almost everything except ARC is covered in my plan

Shaiju T • Jan 9 '20 • Edited

Good one 😄 Do you have any plans to add Memory management in C# for future series ?

Deepu K Sasidharan • Jan 9 '20

I was thinking of adding that, I might once I'm done with the current ones planned

Paolo • Jan 10 '20

thanks for this waiting for the v8 part :)

Deepu K Sasidharan • Jan 10 '20

Thank you. I hope to get to it soon

Matthew Turland • Jan 14 '20 • Edited

I believe PHP stopped altered its reference counting approach in version 7.

nikic.github.io/2015/05/05/Interna...

Deepu K Sasidharan • Jan 14 '20 • Edited

Yes indeed, PHP 5.3 onwards is now using a tracing reference counting GC I believe.

aldama • Jan 14 '20 • Edited

It good be interesting to know how Dart manage memory. I think a lot of people would be interested especially since flutter in getting a lot of momentum.

Deepu K Sasidharan • Jan 14 '20

AFAIK, Dart uses a mark & sweep GC similar to JVM & Go. I'll see if I can add more details in an upcoming chapter

Mathew Grabau • Jan 14 '20

Thank you for writing the article. I enjoyed it and it is a very relevant topic.

I saw C# recommended in the comments - as a .NET developer since .NET 2.0 was new, I agree. I've noticed that C# has gained popularity, but I also know how relevant this topic can be in developing larger applications or chasing performance.

Deepu K Sasidharan • Jan 14 '20

I'll add C# since this is the second request now :)

View full discussion (56 comments)