Discussion on: I'm an Expert in Memory Management & Segfaults, Ask Me Anything!

View post

Hi Jason, I'm encountering a strange seg fault when trying to write default values to an mmap'd region on my disk, but when I try to write to the file manually in gdb, it comes back normal. Here is what I'm talking about

Found 18,542,192 final board states. Explored 17,714,416,428 boards @ 3,628,469 b/s. Runtime: 0:01:20:43 CPU Time: 0:19:49:07 
Thread 32 "main" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffe4a7fc700 (LWP 554025)]
0x0000555555561c20 in heirarchy_insert (h=0x55555556b070, key=3148149460081748650666) at mem_man/heir.c:107
107             for(size_t b = 2; b < num_jumps; b++) ((uint64_t*)(phase[bits]))[b] = 0;
(gdb) p phase
$1 = (void **) 0x7ffa1389f6a0
(gdb) p bits
$2 = 341
(gdb) p phase[bits]
$3 = (void *) 0x7ffff77a7f80
(gdb) p num_jumps
$4 = 48
(gdb) p b
$5 = 16
(gdb) p ((uint64_t*)(phase[bits]))
$6 = (uint64_t *) 0x7ffff77a7f80
(gdb) p ((uint64_t*)(phase[bits]))[b]
$7 = 282584257676671
(gdb) p ((uint64_t*)(phase[bits]))[b] = 0
$8 = 0
(gdb) p ((uint64_t*)(phase[bits]))[b]
$9 = 0
(gdb) p ((uint64_t*)(phase[bits]))[b + 1]
$10 = 0
(gdb) p ((uint64_t*)(phase[bits]))[b + 2]
$11 = 4299030531
(gdb) p ((uint64_t*)(phase[bits]))[47]
$12 = 52640

So when I do it myself, there's no segfault. I've tried to run valgrind, but it's so slow, can you take a look at it? The code is uploaded to github at github.com/iggy12345/reversi_walke...

I've been working at this one for days now and I'll take any help I can get.

Jason C. McDonald • Jan 24 '21

I would run it through valgrind. That should take you right to the line of code that the segfault is being thrown at. (You're welcome to share that output here.)gdb rarely provides much useful information for undefined behavior.

Aaron Jencks • Jan 25 '21

I'll try and see if I can get it to finish, the segfault seems to happen at 18mil final board states, and at 3mil/sec it takes about an hour and a half to get there, but with valgrind... I'm an hour past 1 day and I'm still only at 3mil boards

Jason C. McDonald • Jan 26 '21 • Edited

Although it's an early (and wild) guess, if the segfault occurs with a large set of data, but not a small set, I would suspect a buffer overrun may be the cause of your problems. Are you...

(1) Putting too much on the stack (versus dynamically allocating the space you need), or
(2) Exceeding the space you allocated?