Being C# programmer, I have heard a lot about stack memory, heap memory and difference between those. Usually it is "stack is faster, because no GC happens and heap is slower so you should avoid using it". This idea spreads not over C#, but also Java, C, C++, and many other compiled languages.
In this article I will speak on performance questions, not covering abstract concepts of stack and heap, and how they are different in C#/Java and C/C++.
Dare experiment
For some time I believed that, but once had courage to try using heap instead of stack, and...nothing bad happened.
I wasn't writing some ASP.NET backend, where latency 500ms is considered normal, instead I was developing a game engine, where even 10ms delay is critical for FPS.
I didn't hesitate on using heap memory, and, after I had surprisingly good performance with C# garbage collector, I kept my approaches for Java, making mobile rendering library with OpenGLES.
Still being surprised with good performance, I tried to disable Java garbage collector (-XX:+UseEpsilonGC) and see what happens.
And, I was surprised again - program terminated (due to out of memory) 5 seconds after it started. It consumed nearly 4GB of RAM in 5 sec, what is 800MB/sec.
All of this time my code was consuming that much of RAM, and still didn't have any problems with performance?
Old fairy tales
Yes, modern garbage collectors are that good. You may consume hundreds megabytes of RAM and still see no performance lost. If someone tell they have problems with slow GC, consider that as :
- they consume RAM in rate about 5-10GB/sec
- RAM consumption is acceptable, but they have large memory leaks
- they use/used some old language or it's old compiler/runtime version
- cause of performance loss is not in GC but somewhere else
So, those stories about "slow" heap memory are mostly outdated and can be considered as just fairy tales.
Is access to stack memory faster?
We covered topic on GC, but it can't be only difference between stack and heap, right?
Some of you may know that stack and heap are located on the same physical device (RAM) and approached by CPU in the same way. Difference appears when a compiler tries to optimize our code, replacing use of stack with use of registers. For an unknown reason, nobody talks about that in context of stack/heap separation.
Example :
using System;
class Program
{
static int SomeFunc(int a, int b) {
int c = a + b;
int d = c * c;
int e = d << 1;
return e;
}
}
If you ask a C# programmer, how much of stack space (in bytes) this function would use, answers may be different. Some would count only variables (3 x int = 12 bytes), some would count arguments as well (5 x int = 20 bytes). In fact, it's just guesswork, and all depends on compiler version and it's implementation.
I used the godbolt app to do researches on .NET compiler, and that's result I got for our C# code snippet :
Program:.ctor():this (FullOpts):
ret
Program:SomeFunc(int,int):int (FullOpts):
add edi, esi
mov eax, edi
imul eax, edi
add eax, eax
ret
If you can read assembly code, you already understand, what is going on. There is no stack frame at all. Arguments are passed in registers (edi and esi), and value is returned in eax register. This assembly code does not access RAM, what makes it much faster. Generally, access to stack memory is same slow as access to any other area of RAM (slow comparing to accessing a register).
So, difference between stack and heap exist only in context of specific compiler and it's optimization mechanisms. C# standard says nothing about stack being faster than heap.
CPU caches? Yes, they might make access to stack much faster, but same works with memory on heap.
Don't believe, do research
A lot of things you may hear from old Senior developers were true long time ago, but software changes and (usually) old performance problems are being solved.
Now, when I hear something like "A is slow, use B", I write a short program to compare A with B. Is B really good? By what cost? Is A really much worse than B? May A be improved? And, usually, I get contrary results, where A is same as B or even faster, because my often use case is different from common one.
Bonus content : a story about importance of doubt
A remarkable story happened with my brother, who makes games with Godot. He has heard a lot about Vulkan being faster than OpenGL because it's modern, supports parallelism and gives more control on GPU. So, he compared them, and discovered that version of game with Vulkan had much worse performance than same one but using OpenGL.
So what it tells us? Maybe, performance potential of Vulkan is real, maybe it's just marketing. Maybe, there are special cases where Vulkan in Godot is much faster than OpenGL. Maybe it is only so with specific GPU models. Maybe, Godot developers didn't manage to benefit from Vulkan speed yet.
Everything must be tested, compared and applied in supposed field, and words are just words.
Top comments (1)
If you only use stack memory GC woun't even start. GC should have some overhead, but maybe in your example it can be neglected