DEV Community

Cover image for I'm an Expert in Memory Management & Segfaults, Ask Me Anything!
Jason C. McDonald
Jason C. McDonald

Posted on • Updated on

I'm an Expert in Memory Management & Segfaults, Ask Me Anything!

I'm an expert-level C and C++ developer, with a specialty in memory management. I have experience writing memory-safe code with both the modern safe techniques and the ancient unsafe techniques. I've used malloc and free without killing myself. I love pointers. I've debugged more than my share of undefined behavior, and authored the canonical StackOverflow question on segfault debugging.

Any burning questions about dynamic allocation, undefined behavior, pointers, memory safety, or anything even remotely related? Ask me anything!

(My main languages are C++, C, and Python, although I also deeply grok the underlying computer science principles.)

Oldest comments (161)

Collapse
 
andrewlucker profile image
Andrew Lucker • Edited

What is your preferred method to return from a segfault?

i.e. define

do_segfault()
Collapse
 
codemouse92 profile image
Jason C. McDonald • Edited

If I understand your question right...

A segfault is the best possible behavior that you can get given undefined behavior, because it's a specific runtime error that you can probe. You're actually not guaranteed to get a segfault when your code has undefined behavior.

Thus, it is both technically impossible and entirely unwise to "recover" from a segfault. Let the program crash (did we have a choice?), figure out what in your code is undefined behavior, and fix it.

To put that another way, because a segmentation fault is a runtime error, and one that isn't guaranteed anyhow, it's immune to try-catch statements and error handling.

Collapse
 
presto412 profile image
Priyansh Jain • Edited

Are rust and golang going to take over C and C++? In terms of desktop software/web development. Also how would you, as a C++ expert, rate these languages? Do they have potential?

Collapse
 
qm3ster profile image
Mihail Malo

Rust would take over everything, but the leftist collectivist community threw a baton roue of so-called "golang" at it, and now we will all perish and return to a dark age of literal witch hunts. (Because we are all out of Moore's law)
RIP information technology and human civilization in general

Collapse
 
presto412 profile image
Priyansh Jain

LOL

Thread Thread
 
qm3ster profile image
Mihail Malo

Nihilist

Collapse
 
codemouse92 profile image
Jason C. McDonald • Edited

I strongly believe that (virtually) all languages have their place. FORTRAN and COBOL have firmly established places in the world, and are almost certain never to lose them on account of their reliability and precedence.

C and C++ likewise have this precedence, making up a sizable chunk of our source code. It's the old "if it ain't broke, don't fix it" concept; I doubt the entire collection of software that makes up a standard Linux-based operating system will ever be rewritten from C to Rust, because most of what already exists works quite well.

That said, I think Rust and golang have a lot of potential as languages, especially Rust.

(In my personal opinion, golang is a rather hipster language, but that's based in my feelings towards it, not in anything practical; so take that with a grain of salt.)

Rust looks especially interesting in the area of error handling. I'll admit, I haven't had the time to learn it very well yet, but it's DEFINITELY high on my list!

In other words, Rust and golang will probably find established places in the programming world, but they won't be displacing C, C++, or any other established language. Every tool has its place, and a quarter inch drill bit doesn't replace a 5mm drill bit.

Collapse
 
codevault profile image
Sergiu Mureşan • Edited

Great to see a fellow low-level programmer on here!

I worked on a game engine written in C and was having many issues related to wrongly using the realloc function for dynamically allocated memory. What I did was forget to assign the reallocated memory's pointer to the return value of the function. It took me weeks before I found the underlying problem since only in some cases it would blow up. How would you go about debugging a situation like:

int* p = calloc(5, sizeof(int));
// some code
realloc(p, 6 * sizeof(int)); // notice no assignment

Do you use some sort of special tools? Or just some coding standards to not let this happen?

Collapse
 
codemouse92 profile image
Jason C. McDonald

Whenever I'm working with memory, I pair two different tools: Valgrind and Goldilocks (PawLIB).

Valgrind is a pretty ubiquitous tool on UNIX platforms which will show me all of the memory issues encountered while running, even if the undefined behavior doesn't cause any overt problems. My code isn't done until it's Valgrind-pure. However, Valgrind only monitors the execution, so...

Goldilocks is a testing framework I developed at MousePaw Media, as a part of PawLIB. You could technically use any testing framework, but the benefit to Goldilocks is that it bakes the tests into the final executable, instead of requiring an additional framework to run the tests. That way, you can start the normal executable, run each of the tests you wrote, and see which ones Valgrind complains about.

Mind you, this does require you to write a lot of comprehensive behavioral tests...but you really should be doing that anyway in production code. ;)

Collapse
 
codemouse92 profile image
Jason C. McDonald

I should add, I use another tool from PawLIB called IOChannel - basically, a std::cout wrapper - that allows me to cleanly print the address and raw memory from literally any pointer, without having to use a debugger. This can make debugging some problems infinitely easier, especially when you're contending with a Heisenbug that goes away if compiled with -g, but appears when compiled with -O2.

Thread Thread
 
codevault profile image
Sergiu Mureşan

Thanks for the response!

Unfortunately, I didn't find a version of Valgrind for Windows. I tried DrMemory but, after lots of struggle, it didn't give me any helpful information and dropped the ball. Do you have experience with low-level on Windows or just work exclusively on Linux since it is more convenient?

Thread Thread
 
codemouse92 profile image
Jason C. McDonald

I rarely use Windows for development, as its development toolchain is almost invariably miles behind its UNIX-based counterparts.

If you're on Windows 10, I strongly recommend setting up the Windows Subsystem for Linux [WSL]. That will give you access to the Linux development environment for compiling and testing. Then, use the LLVM Clang compiler on both the WSL and the Visual Studio environments. That way, once you know it compiles and runs Valgrind-pure on WSL, you can trust that it will work on VS Clang.

Collapse
 
liulk profile image
Likai Liu

My approach for this specific problem is to use a compiler that warns about unused return value, such as gcc or clang. I know that stdlib.h on Linux and Mac OS X already decorates realloc() with warn_unused_result attribute.

stackoverflow.com/a/2889601

But just naively setting p = realloc(p, ...) is also wrong, since if the allocation fails, p would be set to NULL but the original object is still allocated. The original pointer is lost and now a memory leak. Use reallocf() which frees the original memory if it could not be resized.

Thread Thread
 
codevault profile image
Sergiu Mureşan

That's a really nice feature, didn't know about it.

But wouldn't that mean data loss in case the memory can't be resized? Wouldn't that become an unrecoverable error?

Thread Thread
 
codemouse92 profile image
Jason C. McDonald

@liulk Ha, I completely forgot to mention Clang! It does indeed have the best warnings of any compiler I've used. I almost always compile with -Wall -Wextra -Wpedantic -Werror; that last one (as you know, although the reader might not) causes the build to fail on any warnings.

I also use cppcheck as part of my autoreview workflow, and resolve all linter warnings before committing to the production branch.

Thread Thread
 
liulk profile image
Likai Liu

@codevault You're right, reallocf() would just free the memory and cause data loss, so it would serve a different use case than realloc(). The more general solution would be to always use this pattern, which is more verbose:

void *q = realloc(p, new_size);
if (q == NULL) {
  // do error handling.
  return;
}
p = q;

I just find that in most of my use cases, I would end up freeing p in the error handling, so I would just use reallocf() which results in less verbose code.

Thread Thread
 
codevault profile image
Sergiu Mureşan

I see, that makes sense. I can see myself freeing the memory most of the time when reallocation fails.

Good to note. Thanks!

Collapse
 
rhymes profile image
rhymes

I know this question is going to come so I'm going to ask it myself: what do you think of languages like Rust?

Do you think that, in some cases, isn't a machine going to be better than a human at dealing with memory management anyway?

Collapse
 
codemouse92 profile image
Jason C. McDonald

See the other question in this thread re: Rust. Long story short, it's a cool language that I haven't yet had time to learn.

In terms of man vs machine, the answer is that "computers are inherently stupid." When we are trusting the machine to manage the memory, what we're really doing is trusting someone else's code to manage the memory. In either case, some human is responsible for the memory handling logic. Therefore, it really depends on the code you're trusting!

The benefit to trusting the language's built-in memory management is that the code is almost certainly more rigorously reviewed and tested. That's where the apparent added trustworthiness comes from.

A lot of times, I will trust automatic memory management tools over my own abilities. std::unique_ptr and std::shared_ptr, for example, are excellent tools that help minimize memory mistakes (because, after all, I'm only human). However, there are times that the logic I need would become too convoluted with those magic pointer classes, so I'll resort to manual management.

It's basically a balancing act between simplicity (the more complicated the code, the more chance for bugs) and safety (reducing the chances of a memory leak). If you write really complicated code to use "memory safe" tools, you can still wind up making a royal hash of it, when a simple pointer would have meant 80% less logic, and thus prevented those issues.

Collapse
 
rhymes profile image
rhymes

Thanks for the detailed explanation!

I wonder if in the future there will be attempts to introduce AIs into managed memory systems, to increase what you call "apparent added trustworthiness".

Collapse
 
rapidnerd profile image
George

What are some of the common issues you find when working in memory management? Do you have one way of working through them or multiple?

Collapse
 
codemouse92 profile image
Jason C. McDonald

I do go through a little list in my mind:

  • Do my allocations and frees match? (malloc and its cousins with free, new with delete, new[] with delete[]).
  • Do I null out my pointers immediately after freeing?
  • Do I check if a pointer is null before using it?
  • Does my pointer math have any edge cases?
  • Are my iteration loops all safe? (I always get uneasy around while loops that touch allocated memory.)
  • Do my recursive functions have explicit stop conditions?
  • Are my C-strings (if any) null terminated?
  • Are my destructors properly freeing allocated memory?
  • Do I have tests for all major functionalities?
  • Are all my tests running Valgrind-pure?
Collapse
 
mortoray profile image
edA‑qa mort‑ora‑y

Given that you're writing a fairly low-level, perhaps wait-free concurrent algorithm, it's possible that you have a bug that trips up about every millionth execution of the code. It's a race condition that corrupts a vital structure. Any attempt to trace upsets the condition leading to the error, thus making it go away. Your only choice is a tedious hand execution and logical reasoning.

My question is, how do you avoid throwing the computer out the window?

Collapse
 
codemouse92 profile image
Jason C. McDonald • Edited

Avoid? I love tedious hand execution and logical reasoning! (No, seriously.) One of my absolute favorite things to do in programming is to print off the source, sit down with a pen, a hot beverage, a blank notebook, and a jazz soundtrack...and then spend the next hour or three just desk-checking the entire thing.

Mmmmmmmmmmmmmmmmmmmmmm, bliss. ,^

Why do you think I specialized in memory management and undefined behavior? I ADORE it!

Now, if you don't have my particular mental condition, and actually don't enjoy desk-checking for Heisenbugs, my advice is this: get off the computer. Print off the source, cozy up in your favorite chair in a relaxing environment, and desk-check it.

Collapse
 
liulk profile image
Likai Liu

I also recommend writing unit tests that makes the race condition more likely to happen. For example, if the code normally runs with < 10 threads, test it with 1000 threads. Sometimes code is well-behaved when the data entered are far apart, so try testing with consecutive values. If it's the opposite, test with random values.

What I learned over the years is that race-freedom is not composeable: code using several mutexes incorrectly could still suffer race condition, even though a single mutex is race-free on its own. When testing wait-free algorithms, start with very small primitives and gradually add onto it. And write plenty of assert() on the non-volatile local variables of the shared volatile variables the code might be using. When assert triggers under the debugger, you'll be able to see which invariants are violated in that snapshot.

Collapse
 
sparxmith profile image
Eric J. Falgout

C is my favorite language, but I have never worked with it professionally. I've completed "Learn C the Hard Way".

What's another challenging text for a professional developer who is a C dilettante?

Collapse
 
codemouse92 profile image
Jason C. McDonald

I've got a few on my shelf I enjoy in that category:

  • Game Programming Patterns by Robert Nystrom discusses many patterns, including a number related to dynamic allocation and memory management, from a game development perspective. (Written mainly for C++, although you could take on the challenge of implementing the patterns in C!) Besides that, his comical, bantering style makes for a really fun read.

  • Hacker's Delight by Henry S. Warren contains a number of mind-bending algorithms that operate in C and Assembly.

  • Game Engine Architecture by Jason Gregory explores the myriad of challenges that game engine developers face, especially issues of performance and memory management. Again, this written primary for C++, but you can approach many of the problems from a C perspective as well.

  • The Art of Computer Programming by Donald Knuth. Okay, I don't own this one, but I really really want a copy! It's quite a challenge to wrap your head around his algorithms and patterns, many of which are fundamental to the field of computer science.

Collapse
 
bluhmalexander profile image
bluhm-alexander

Have you ever worked with the Motorola 68000. I really like that CPU. In your opinion do you think assembly language is still best for super low level hardware or do you think C is on par with assembly code?

Collapse
 
codemouse92 profile image
Jason C. McDonald

Ironically, I just added 68K Assembly to my list of languages to learn soon! I have a TI-89 calculator (Motorola 68000), and dearly want to play with it.

Up to this point, my assembly work has been largely limited to the X86 and X64 languages, in the context of Intel and AMD processors.

C is actually further up the stack than people think, and it isn't always the best choice for a given architecture. If you need total control, Assembly will always give that to you far and beyond any other language.

However, Assembly is also a pain in the butt (if an endearing one to certain classifications of nerds such as myself). If you have access to a higher level language that is reasonably optimized for that platform, and you don't need ultimate control, use it instead of Assembly.

In other words, "just because we can doesn't mean we should." If you can't make a reasoned argument for the language you're using, you're probably using the wrong language. :)

Collapse
 
bluhmalexander profile image
bluhm-alexander

Thank you for the reply, I always value getting a second opinion. The reason I'm asking this question is because I am building a game on the Sega Genesis and I've been using A C compiler to do it.

So far it hasn't been an issue because the C compiler was built for the Sega Genesis and it has a lot of nifty features to take advantage of the hardware features such as DMA. More importantly it has sound drivers which are incredibly useful because I do not want to go around writing my own Sound Driver because I am not experienced with writing such a program.

I have recently run into a few short comings with the compiler. First and foremost being that the routines I've written in C don't seem to load as fast onto the screen as compared to Assembly.

I think I will compromise by writing my screen drawing routines in Assembly and then including them in my C code. I think that would be best for me because then I would have access to features in the C compiler as well as having access to the speed of Assembly. The problem is that I am not experienced with Assembly code. Fortunately for me, 68k assembly seems to be the easiest Assembly to learn.

By the way the C compiler I'm using is called SGDK (Sega Genesis Development Kit)

What do you think about mixing languages, is it something to be avoided?

Thread Thread
 
codemouse92 profile image
Jason C. McDonald

It really depends on the languages!

There's no trouble combining C and Assembly; ultimately, C is compiled down to Assembly, at which point any Assembly code you wrote outright is just inserted in. Then, the whole thing is assembled down to binary on that particular platform.

However, you can run into varying degrees of performance issues when mixing other languages. It has to be taken on a case-by-case basis.

Bravo on making a game for Sega Genesis! Keep us posted on dev.to how that goes.

I highly recommend picking up "Game Engine Architecture" by Jason Gregory. It addresses many of the issues you're facing, and hundreds more besides, from a C and C++ perspective. He even talks about console development.

Collapse
 
emad07306 profile image
emad07306

Any advice for complete new coding beginners please would appreciate it?

Collapse
 
codemouse92 profile image
Jason C. McDonald • Edited

Sure.

Don't mess with manual memory management...yet.

It's very, very easy to proverbially blow a limb off with manual memory management. Get skilled with the fundamentals of programming first, and establish habits that allow you to write clean, stable code in a higher level language.

Once you can write a few hundred lines of code in, say, Python or Java, and have them work right on the first or second attempt, then dive into more advanced concepts. Manual memory management is something that's easy to get wrong, so you need to first have an established track record with yourself of getting things right.

Before I ever touched C++ and memory management, I had written a couple of reasonably stable, small applications, and had actually implemented a programming language in ActionScript 3.0 with regular expressions. (I don't recommend the latter; it was a great challenge, and it worked great for the purpose it was designed, but it pretty well sucked in terms of performance.)

With all that under my belt, I was able to start using C++. Even then, I avoided manual allocation whenever possible, using memory-safe tools and methods first. Once I was experienced with those, I started doing more and more manual allocation and raw pointer arithmetic. I made a lot of mistakes at the start, but that's how we learn best!

Collapse
 
safijari profile image
Jariullah Safi

Oh what a small world. I too am an expert in segfaults. Nary a day goes by without my code segfaulting...

Sorry, obvious joke that I didn't see anyone else make. Thank you for doing this, it's very insightful.

Collapse
 
codemouse92 profile image
Jason C. McDonald

That's how you become an expert at solving them. ;)

Collapse
 
amineamami profile image
amineamami

what’s the simplest way to cause memory leak ?

Collapse
 
codemouse92 profile image
Jason C. McDonald • Edited

Fail to free memory after you allocate it, and then destroy and the pointer.

My return question is, why would you want to? ;-)

Collapse
 
daemoen profile image
Marc Mercer

Ironically, you of all people should have a very easy answer to your own question -- 'Why would you want to?' -- To learn more about what went wrong? To study it, to understand it, and to prevent it from happening again in the future. Sometimes, the best way of learning is by knowingly doing something 'silly' -- I wouldn't call it stupid because you are doing it with the expectation, which means you are preparing for it. All good engineers try to reach this state -- be aware of what can go wrong, and how to handle it.

Thread Thread
 
codemouse92 profile image
Jason C. McDonald

Yes, that's fair, in learning. But, honestly, 99% of my learning comes from just trying to do hard things, and working with the failures as they come. Those are far more practical and effective to learn from than any sort of deliberately manufactured mistake.

Collapse
 
bbasile profile image
Basile B.

About Garbages Collectors (not mentioned so far !) i think that they are nice but should not be a default management so rather optional. Once a GC handles the allocations it's very hard to use another alternative management on top because GCs tend to free manually allocated resources since they think they are not used anymore, i.e not used by a root memory block.

What's your favorite management technique: manual, ref counting or GC ?
Do you share my point of view on ?

Collapse
 
codemouse92 profile image
Jason C. McDonald • Edited

Since I use C++ primarily, and it doesn't have a built-in garbage collector by default, I've just formed the habits of handling everything myself. Those habits and instincts carry over to other languages that do have GCs, but I will still manually free things as far as I'm allowed.

In a broader sense, I don't generally trust generic abstractions to do my work for me. If I'm not sure what's needed, I'll leave it to the automatic systems, but try to understand what's happening under the hood. If I know for certain what needs to happen, I'll do it myself, and let the automatic systems do mop-up work behind me in case I miss anything.

In the same way, I never let the compiler define constructors or destructors for me. Every (non-static) class I write has, in the least, explicitly empty constructors. Knowing how my coding adventures usually go, the one time I trust the compiler to define the destructor, it'd hit an edge-case and bork. So, I don't leave much room for that kind of madness.

Ironically, the above is probably in part my Python background talking: "explicit is better than implicit".

Now, with that said, one should know all the automatic tools their language offers, and how to use (and not to use) them. Doing things manually is not an excuse for ignorance. All the above does not preclude me from using such bits of magic as std::unique_ptr, which handles its own deallocation via a GC. I simply make an informed decision on whether to do it myself, or to use a tool that specifically matches the use case.


By the way, in terms of ref counting, I am reminded of a classic AI Koan...

One day a student came to Moon and said: “I understand how to make a better garbage collector. We must keep a reference count of the pointers to each cons.”

Moon patiently told the student the following story:

“One day a student came to Moon and said: ‘I understand how to make a better garbage collector...

Collapse
 
mouvedia profile image
G.

What do you think of Zig?

Collapse
 
pedro82615247 profile image
Pedro • Edited

Hello , If you could help me , it's a school project the thing is , I get segfault when I run and type a big string as an input now the real thing is I have a buffer big enought and when I run with valgrind I get no seg faults do you have any idea?

Collapse
 
codemouse92 profile image
Jason C. McDonald

I could probably help, but I'd need to see your code to do so. Can you create a Github Gist?

Collapse
 
alejandrosilvestri profile image
AlejandroSilvestri

Hi there, I am pleasantly surprised to discover this site.

Some people using my code had segmentation fault, I'm looking for a way to generate console output these people can copy and send me, so I can track down the issue.

When SEGFAULT happens, game is over. I'd like to capture this, inspect some variables and cout them before exiting, something like try catch. But I believe cout after SEGFAULT leads into undefined behaviour, so...

Any suggestion? Thank you.

Collapse
 
codemouse92 profile image
Jason C. McDonald

Unfortunately, it is not possible to "catch" a segfault, nor continue program behavior safely (if at all) after it has been raised. Therefore, you have to take the opposite approach, and log everything that happened leading up to the segfault.

You can also have your tester describe (or screen record, especially if it's a game) what happened leading up to the segmentation fault. Then, you should be able to replicate that on your own machine.

Mind you, "replication" won't necessarily mean you can recreate the segmentation fault itself, since it's one of an infinite number of possible behaviors in response to some illegal memory action your code is taking (ergo "it is legal for the compiler to make demons fly out of your nose"). That's what it meant by undefined behavior. However, by replicating the same steps as your tester while running the Debug build of the application (compiled with -g) under Valgrind, you should be able to catch the problem.

There is also a more proactive approach you can take, especially if you're using C++: modernize your code base. Refactor the code - by hand mind you, NOT by using find-and-replace or some other automated tool - to make use of smart pointers like std::unique_ptr and std::shared_ptr instead of raw pointers, new, and delete. This will eliminate most memory errors, since the smart pointers handle object lifetime and whatnot (formally known as RAII). Refactoring is not a "quick fix", but it's the most resilient fix.

Collapse
 
alejandrosilvestri profile image
AlejandroSilvestri

Thank you very much. You confirmed my approach is right: cout everything!

In my specific case, my code is appended to third party code where the segfault happens, so it's hard to trace and I don't even have the chance to fix.

Thread Thread
 
codemouse92 profile image
Jason C. McDonald • Edited

At the risk of self-promotion, I wrote something called IOChannel which is designed to better control cout-style logging, based on category and priority. You can also route messages to different places, including to functions that will write them out to a file instead of printing them to the console. It's part of PawLIB, which is still in development, but 1.0 is stable. (Yes, totally open source)