Page 1 of 1

Probably the worst bug to debug

Posted: Tue Dec 23, 2014 9:27 pm
by Bender
For the past 2 days, I was debugging my code because it caused a very strange segmentation fault, strange because it did not happen while my code was executing, but it was happening inside exit().
Firstly, this rules out some of the most common cases:

1. Accessing an invalid memory region / NULL pointer / already freed region (then I would've got the segfault pretty earlier)
2. Double free (same as above)
3. Accessing read-only-memory (e.g. Code Segment -- no, same as above)
etc..

Since the program is crashing during exit, this probably indicates that I might've caused a stack overrun, by doing an operation to a buffer created on stack, such that the size specified during the operation is greater than what actually the buffer is. For this to happen the crash must've happened very early in the program. Out of ideas, I went to #osdev, where sortie asked me if calling "exit(0)" causes a segfault, -- and it did, which means that this isn't a problem with the stack, since explicitly calling exit shouldn't require the return address to be on stack, but rather my 'exit' is somehow crashing. I searched on google for possible cases, one of them being memory and heap corruption. I ran a bunch of tools valgrind, gdb, electric fence, GCC's built-in address sanitizer, etc. and none of them actually told me what actually was causing the segfault, bringing me to a point where I considered that my libc had bugs. GDB was interesting though, as the program would work fine in most cases.

Then I updated my glibc, same result, upgraded my compiler, same result, and I went to point of upgrading my entire distro, well uh, same result.

Since a lot of people on #osdev, ##c, and #glibc suggested me to re-run my code under valgrind, I finally decided to pay serious attention to it's output, and it pointed me to an internal glibc function: _IO_flush_all_lockp, under __run_exit_handlers, which is called by exit @ genops.c. Unfortunately, I couldn't find much info about it, except another person having the same problem, but due to using uninitialized pointers.

It was 2 days already, I finally decided to take a look at the glib sources, and found my "_IO_flush_all_lockp", looking a what it was doing which seemed to flush all open streams, and then locking them, I realised my fault.

Code: Select all

FILE* fp = fopen("filename.ext");
....
free(fp); /** error: Must be fclose **/
=D>

Now that's pretty clean, debuggers won't suspect a thing, since I'm just freeing memory someone (in this case the kernel perhaps), allocated to me, GCC wouldn't warn me, since I could always allocate a "FILE*" pointer myself, for some unusual reason. Really hard to detect if your source files are long, worse even, it was a typo. And poor libc is trying to access that, since the stream is still open, and BOOM.

The most annoying part of this bug was that it'd randomly happen, for example making small "so-called fixes" in the program, would make it look like the bug disappeared, but after a few runs, it'd appear again.

/me checks again to see if the bug is actually solved.

Re: Probably the worst bug to debug

Posted: Wed Dec 24, 2014 3:32 am
by no92
Why is this in the Auto-Delete Forum? It's extremely good and useful information.

Re: Probably the worst bug to debug

Posted: Wed Dec 24, 2014 4:22 am
by iansjack
Strange. I always understood that exit() was guaranteed to close all open (stdio) files even if the programmer forgot to.

Re: Probably the worst bug to debug

Posted: Wed Dec 24, 2014 6:48 am
by Bender
iansjack wrote:Strange. I always understood that exit() was guaranteed to close all open (stdio) files even if the programmer forgot to.
Yes, and that's what it was trying to do, but I had (by mistake), freed the file pointer instead of closing it, and hence it caused a segfault while attempting to close it.

Re: Probably the worst bug to debug

Posted: Wed Dec 24, 2014 1:15 pm
by gravaera
Reading a well done writeup about somebody else's bug hunting is always fun, because you feel that kinship with your own past experiences :)

--Peace out,
gravaera

Re: Probably the worst bug to debug

Posted: Thu Dec 25, 2014 3:01 am
by mathematician
Back in the days of MS-DOS I would spend days trying to debug an assembly language program,and usually the culprit would turn out to be a hardware interrupt modifying some variable or other.

Re: Probably the worst bug to debug

Posted: Thu Dec 25, 2014 3:04 am
by Roman
Isn't GDB able to trace the call stack? If so, it would be easy to find, where's the problem.

Re: Probably the worst bug to debug

Posted: Thu Dec 25, 2014 4:30 am
by Bender
Roman wrote:Isn't GDB able to trace the call stack? If so, it would be easy to find, where's the problem.
The backtrace told me that it was "exit()" that was causing the fault, but it's highly unlikely that glibc would have a bug, since it's got a ton of programs using it, and even if there is, a bug in exit, a function used by every C program out there, impossible.
The real bug was in "free(fileptr)" -- but doing that is (by language) perfectly legal, although, undefined, since you're supposed to use fclose for file streams.

Re: Probably the worst bug to debug

Posted: Thu Dec 25, 2014 5:35 am
by Combuster
Bender wrote:doing that is (by language) perfectly legal, although, undefined
Actually, the language specification states under free:
C standard wrote:ptr - Pointer to a memory block previously allocated with malloc, calloc or realloc.
So passing something you obtained from fopen() is not a valid parameter, and thus not legal.

That is apart from the fact that undefined behaviour is not to be considered legal in the first place.

Re: Probably the worst bug to debug

Posted: Tue Dec 30, 2014 2:47 pm
by KemyLand
Combuster wrote:
Bender wrote:doing that is (by language) perfectly legal, although, undefined
Actually, the language specification states under free:
C standard wrote:ptr - Pointer to a memory block previously allocated with malloc, calloc or realloc.
So passing something you obtained from fopen() is not a valid parameter, and thus not legal.
There are some functions that return legally free()able pointers, such as strdup().

Re: Probably the worst bug to debug

Posted: Sat Jan 03, 2015 7:28 am
by onlyonemac
KemyLand wrote:There are some functions that return legally free()able pointers, such as strdup().
Furthermore in this case the compiler wouldn't know where the pointer was obtained from.

Re: Probably the worst bug to debug

Posted: Sat Jan 10, 2015 9:40 pm
by cyx
KemyLand wrote:
Combuster wrote:
Bender wrote:doing that is (by language) perfectly legal, although, undefined
Actually, the language specification states under free:
C standard wrote:ptr - Pointer to a memory block previously allocated with malloc, calloc or realloc.
So passing something you obtained from fopen() is not a valid parameter, and thus not legal.
There are some functions that return legally free()able pointers, such as strdup().
You can make any function return a legally free()able pointer as long as it allocates it with malloc, calloc or realloc ;)

Re: Probably the worst bug to debug

Posted: Sun Jan 11, 2015 6:25 am
by Kevin
KemyLand wrote:There are some functions that return legally free()able pointers, such as strdup().
strdup() isn't standard C (yet), and POSIX defines free() so that it's legal to pass a "pointer earlier returned by a function in POSIX.1‐2008 that allocates memory as if by malloc()".