loading...

Interview question: heap vs stack (C#)

tyrrrz profile image Alexey Golub Updated on ・5 min read

Preface

Recently I've made myself unemployed and for a few weeks now I'm frequently doing technical interviews as part of looking for a new job. Despite each project being unique, interviewers tend to ask the same questions from time to time. I've decided to keep track of the most common interview questions I get and write down my answers to them.

When you're in-between jobs, getting ready for interviews may be tedious. So instead I welcome you to follow this series, in which I will try to cover different interview topics in each post. I will try to write these regularly and I hope that it can be helpful to someone.

Note: if you find a mistake, please let me know in the comments and I will correct it. Thanks!

Heap vs stack


Q: Where are objects allocated in C#?

In C# there are two places where an object can be stored -- the heap and the stack.

Objects allocated on the stack are available only inside of a stack frame (execution of a method), while objects allocated on the heap can be accessed from anywhere.


Q: Which objects are allocated on the stack and which objects are allocated on the heap?

Note: you should never say "reference types are allocated on the heap while value types are allocated on the stack", this is a commonly repeated mistake and sets off a red flag for an experienced interviewer.

Reference types (classes, interfaces, delegates) are always allocated on the heap.

When you pass a reference object as a parameter or assign it to a variable, you're in fact passing its reference. The reference (not the referenced object) can be allocated both on the stack or on the heap.

By passing a reference to an object, you're telling where that object is located on the heap so that your code can access it.

Every time an object is passed as a reference, the reference itself is copied. This means that you can change the reference to point to a different object without affecting the previous object itself or other references pointing to it. A reference is lightweight and is always constant size (32 bit or 64 bit depending on OS bitness) so copying it (and thus passing around reference types) is considered cheap.

Value types (derived from System.ValueType, e.g. int, bool, char, enum and any struct) can be allocated on the heap or on the stack, depending on where they were declared.

  • If the value type was declared as a variable inside a method then it's stored on the stack.
  • If the value type was declared as a method parameter then it's stored on the stack.
  • If the value type was declared as a member of a class then it's stored on the heap, along with its parent.
  • If the value type was declared as a member of a struct then it's stored wherever that struct is stored.

Starting with C#7.2, a struct can be declared as ref struct, in which case it will always be allocated on the stack, preventing it from being declared inside reference types.

Instances of value types are passed by copy (unless used with reference semantics, see below). This means that every time a value type is assigned to a variable or passed as parameter, the value is copied.

Because copying value types can get expensive depending on the size of the object, it's not recommended to declare memory-heavy objects as value types.

Since every type in C# derives from System.Object, value types can be assigned to variables or passed to methods that expect an object. In such cases, the value is copied and stored on the heap wrapped as a reference type, in an operation known as boxing.


Q: Can we use value types with reference semantics?

Keywords such as ref and out, ref return and ref local (C#7.0), in (C#7.2) allow accessing value types by reference. This means that instead of copying the value, the consuming code will receive a reference to the value instead, be it on a stack or on a heap, as long as the lifetime of that value type is longer than that of consuming code.


Q: How is the heap memory freed up?

While the objects stored on the stack are gone when the containing stack frame is popped, memory used by objects stored on the heap needs to be freed up by the garbage collector.

When an object stored on the heap no longer has any references pointing to it, it's considered eligible for garbage collection.

At a certain point, garbage collector kicks in, interrupts all running threads, invokes the finalizers of the objects it's trying to get rid of (on a special finalizer thread), and then marks the memory as free to use.


Q: What issue may happen due to allocation and de-allocation of memory on the heap?

As the memory on the heap is allocated and de-allocated, it becomes fragmented. See the following diagram:

HEAP:
---][-------][----------][-----]........
      obj 1      obj 2    obj 3   free

When obj 2 is de-allocated, its memory becomes free:

HEAP:
---][-------]............[-----]........
      obj 1      free     obj 3   free

Now, if the runtime needs to allocate another object on the heap, it may use the memory freed up by obj 2, but only if the new object actually "fits". If that memory is not enough, the runtime may request more contiguous memory from the operating system by expanding its working set, as shown here:

HEAP:
---][-------]............[-----][--------------------]...
      obj 1      free     obj 3         obj 4

As a result of the fragmentation, the memory usage becomes less efficient. To deal with this, garbage collector may rearrange the memory so that there are no gaps. This is done by simply copying the bytes around, in an operation called "defragmentation".

HEAP:
---][-------][-----][--------------------]...............
      obj 1   obj 3         obj 4               free

Q: What is Large Object Heap and what is it used for?

Depending on the size of the consumed memory, memory defragmentation can be expensive, that's why the heap is further separated into Small Object Heap (SOH) and Large Object Heap (LOH).

An object is stored on the SOH if it's smaller than 85kbytes, otherwise it's stored on the LOH. This cut off point of 85000 bytes was empirically devised as the point after which defragmentation no longer provides performance benefits.

Due to how CPUs deal with doubles, arrays of double are an exception, such objects are stored on the LOH if there are more than 1000 elements in the array.

Memory in LOH is (normally) not defragmented, providing better performance at the cost of less efficient memory usage.

Posted on by:

tyrrrz profile

Alexey Golub

@tyrrrz

Filtering data with a cheesegrater

Discussion

pic
Editor guide
 

"Reference types (classes, interfaces, delegates) are always allocated on the heap and never on the stack."
Yes, reference types are generally allocated on the heap but there's no guarantee and clever compilers can and will allocate objects on the stack if they can prove that the reference to the object never escapes (you can read up on Escape analysis).

"Keywords such as ref and out, ref return and ref local (C#7.0), in (C#7.2) allow accessing value types by reference. This means that instead of copying the value, the consuming code will receive a reference to the value instead, be it on a stack or on a heap, as long as the lifetime of that value type is longer than that of consuming code"
Yeah, that's what you'd intuitively think, but sadly that's (generally) not the case. Value types are specified to be immutable, which the compilers has to guarantee. That means if you pass a value type around via ref or similar you'll get a defensive copy. Same thing if you try to call methods on a struct.

This has caused quite the performance problems over the years. Luckily readonly struct was introduced to avoid this problem.

It would also be a good idea to mention that what you're describing about the GC and LOH in particular is an implementation detail and not contractually guaranteed.

Also finalization strategies are also implementation defined. Currently it is not true that the GC stops everything while finalizers are being run (after all we have a dedicated thread for it), which is one reason that makes finalizers so complicated to implement correctly. There is also no guarantee that finalizers won't run on the thread pool instead of a single dedicated thread in the future (so don't rely on finalizers being sequentially executed!). Actually I'm not even sure if .Net Core still has a dedicated thread here.

 

In all fairness, everything about how the memory management works in CLR is an implementation detail, although it doesn't stop interviewers from asking these questions. :)

Value types are specified to be immutable, which the compilers has to guarantee.

You're thinking about ref readonly and in specifically, in which case yes, a defensive copy has to be made because the compiler cannot be sure that the object is not mutated.

 

"In all fairness, everything about how the memory management works in CLR is an implementation detail, although it doesn't stop interviewers from asking these questions. :)"
I know people who ask these questions in the hope of getting push back, but I'll agree - lots of questionable interview questions out there :)

Also yes you're right, the copy only happens if you use in or access a property of a readonly field that's a struct.

 

The string type deserves a special mention because it's a reference type but it's primarily passed by copy, not by reference. This is the only such type

This is wrong, string is reference type and it is always passed as reference (and reference is a pointer to actual string contents, this pointer is passed as value, pointer is always on stack when passed/returned in method)

However string is designed to be immutable so you cannot modify it (this is why you feel it is copied, but it is not), imagine if you have 100KB of text in string passing it from one method to another would be time consuming. When you run a method like .ToUpper() etc, this is the time a new string is allocated on heap and its reference is sent to you.

Also literal strings are declared in assembly's resources, which is loaded on the heap, string is not copied and it can never be, it would be worst design ever.

reference types are allocated on the heap while value types are allocated on the stack

This statement is correct (with exception of closure), because fields do not constitute as type, class/structure containing them is a type. If experienced developers do not understand true definition of the type then it is certainly wrong place to work !!

Only in case of closure, every captured variable becomes part of a reference stored on heap.

This is the reason, there are local functions, captured value type variables in local functions are not stored on heap.

So I would recommend shorter sentence,

Fields do not constitute as type, all reference types and all value types captured in lambda are on heap and remaining value types are always on stack.

 

This is wrong, string is reference type and it is always passed as reference (and reference is a pointer to actual string contents, this pointer is passed as value, pointer is always on stack when passed/returned in method)
However string is designed to be immutable so you cannot modify it (this is why you feel it is copied, but it is not), imagine if you have 100KB of text in string passing it from one method to another would be time consuming. When you run a method like .ToUpper() etc, this is the time a new string is allocated on heap and its reference is sent to you.

You're completely right, thanks for correcting.

This statement is correct (with exception of closure), because fields do not constitute as type, class/structure containing them is a type. If experienced developers do not understand true definition of the type then it is certainly wrong place to work !!

I'm not sure what your point is here. A reference type (i.e. class) can be declared with a value type field inside of it. The lifespan of the memory allocated for this field cannot be shorter than the lifespan of the memory allocated for the containing type, so both have to be placed on the heap.

 

My point is, you cannot use term value type for a field. Field is a member of type. Members belong where ever the containing type exists that's all (this is well known phenomenon).

My point is, you cannot use term value type for a field. Field is a member of type.

A field is member of a type but also represents an instance of some type as well. The term "value type field" is a field whose type (not the declaring type) is a value type.

For example, see here, you can get the type of a field by getting the value of FieldInfo.FieldType property. You can then check if it's a value type through checking Type.IsValueType property.

Members belong where ever the containing type exists that's all (this is well known phenomenon).

Yes, that's what I said. Hence why saying "value types are allocated on the stack" is not correct, even if you exclude closures.

For example, here's an article by Jon Skeet referencing the subject in the second paragraph.

Type is something you can always do typeof(x), you can never do type of (field of (class/struct)). Example,

    struct A {
       ? a;
    }

First of all you can never do typeof(A.a) because a is a field of type, it is not a type !

FieldInfo.FieldType is type of field, field is not type. Again, value type is a type, which you can safely do typeof(int), typeof(string), anything that can sit inside typeof expression is a type, field is not type.

Here, A is a type, since it is a struct, it will always be on stack unless captured by lambda. And whatever may be the type field a, A will always be on stack !! Member of a type is not type !! Field/Method/Property all are member of type and allocation will never depend on them. If a is string, it is reference, but string is a type, a is not type, and A will still sit on stack and contents of string will be on heap and a will store reference and entire object will sit on stack.

You can do A.a.GetType() to get the field type. Field type can be value type. I'm not talking about field being a type.

 

Thanks for writing this! I've been reading about this topic all day and it has been rather confusing to be honest. I have seen read many sources that do assert that the main difference is that value types go to the stack, and reference types to the heap.

This may seem like a silly question (I know, there are no silly questions in programming), but if I declare a global variable of type int, will that be stored on the heap?

 

There are no "global" variables in C#, as everything is part of some class (or struct). But if it's a field in a class then it's most likely going to be on the heap.

 

Grammatical mistake:

Despite each project being unique, interviewers tend ask the same questions from time to time.

should be:

Despite each project being unique, interviewers tend to ask the same questions from time to time.

 

Thanks, fixed.