Probably the worst bug to debug

Programming, for all ages and all languages.
Post Reply
User avatar
Bender
Member
Member
Posts: 449
Joined: Wed Aug 21, 2013 3:53 am
Libera.chat IRC: bender|
Location: Asia, Singapore

Probably the worst bug to debug

Post by Bender »

For the past 2 days, I was debugging my code because it caused a very strange segmentation fault, strange because it did not happen while my code was executing, but it was happening inside exit().
Firstly, this rules out some of the most common cases:

1. Accessing an invalid memory region / NULL pointer / already freed region (then I would've got the segfault pretty earlier)
2. Double free (same as above)
3. Accessing read-only-memory (e.g. Code Segment -- no, same as above)
etc..

Since the program is crashing during exit, this probably indicates that I might've caused a stack overrun, by doing an operation to a buffer created on stack, such that the size specified during the operation is greater than what actually the buffer is. For this to happen the crash must've happened very early in the program. Out of ideas, I went to #osdev, where sortie asked me if calling "exit(0)" causes a segfault, -- and it did, which means that this isn't a problem with the stack, since explicitly calling exit shouldn't require the return address to be on stack, but rather my 'exit' is somehow crashing. I searched on google for possible cases, one of them being memory and heap corruption. I ran a bunch of tools valgrind, gdb, electric fence, GCC's built-in address sanitizer, etc. and none of them actually told me what actually was causing the segfault, bringing me to a point where I considered that my libc had bugs. GDB was interesting though, as the program would work fine in most cases.

Then I updated my glibc, same result, upgraded my compiler, same result, and I went to point of upgrading my entire distro, well uh, same result.

Since a lot of people on #osdev, ##c, and #glibc suggested me to re-run my code under valgrind, I finally decided to pay serious attention to it's output, and it pointed me to an internal glibc function: _IO_flush_all_lockp, under __run_exit_handlers, which is called by exit @ genops.c. Unfortunately, I couldn't find much info about it, except another person having the same problem, but due to using uninitialized pointers.

It was 2 days already, I finally decided to take a look at the glib sources, and found my "_IO_flush_all_lockp", looking a what it was doing which seemed to flush all open streams, and then locking them, I realised my fault.

Code: Select all

FILE* fp = fopen("filename.ext");
....
free(fp); /** error: Must be fclose **/
=D>

Now that's pretty clean, debuggers won't suspect a thing, since I'm just freeing memory someone (in this case the kernel perhaps), allocated to me, GCC wouldn't warn me, since I could always allocate a "FILE*" pointer myself, for some unusual reason. Really hard to detect if your source files are long, worse even, it was a typo. And poor libc is trying to access that, since the stream is still open, and BOOM.

The most annoying part of this bug was that it'd randomly happen, for example making small "so-called fixes" in the program, would make it look like the bug disappeared, but after a few runs, it'd appear again.

/me checks again to see if the bug is actually solved.
"In a time of universal deceit - telling the truth is a revolutionary act." -- George Orwell
(R3X Runtime VM)(CHIP8 Interpreter OS)
no92
Member
Member
Posts: 307
Joined: Wed Oct 30, 2013 1:57 pm
Libera.chat IRC: no92
Location: Germany
Contact:

Re: Probably the worst bug to debug

Post by no92 »

Why is this in the Auto-Delete Forum? It's extremely good and useful information.
User avatar
iansjack
Member
Member
Posts: 4724
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Probably the worst bug to debug

Post by iansjack »

Strange. I always understood that exit() was guaranteed to close all open (stdio) files even if the programmer forgot to.
User avatar
Bender
Member
Member
Posts: 449
Joined: Wed Aug 21, 2013 3:53 am
Libera.chat IRC: bender|
Location: Asia, Singapore

Re: Probably the worst bug to debug

Post by Bender »

iansjack wrote:Strange. I always understood that exit() was guaranteed to close all open (stdio) files even if the programmer forgot to.
Yes, and that's what it was trying to do, but I had (by mistake), freed the file pointer instead of closing it, and hence it caused a segfault while attempting to close it.
"In a time of universal deceit - telling the truth is a revolutionary act." -- George Orwell
(R3X Runtime VM)(CHIP8 Interpreter OS)
User avatar
gravaera
Member
Member
Posts: 737
Joined: Tue Jun 02, 2009 4:35 pm
Location: Supporting the cause: Use \tabs to indent code. NOT \x20 spaces.

Re: Probably the worst bug to debug

Post by gravaera »

Reading a well done writeup about somebody else's bug hunting is always fun, because you feel that kinship with your own past experiences :)

--Peace out,
gravaera
17:56 < sortie> Paging is called paging because you need to draw it on pages in your notebook to succeed at it.
User avatar
mathematician
Member
Member
Posts: 437
Joined: Fri Dec 15, 2006 5:26 pm
Location: Church Stretton Uk

Re: Probably the worst bug to debug

Post by mathematician »

Back in the days of MS-DOS I would spend days trying to debug an assembly language program,and usually the culprit would turn out to be a hardware interrupt modifying some variable or other.
The continuous image of a connected set is connected.
User avatar
Roman
Member
Member
Posts: 568
Joined: Thu Mar 27, 2014 3:57 am
Location: Moscow, Russia
Contact:

Re: Probably the worst bug to debug

Post by Roman »

Isn't GDB able to trace the call stack? If so, it would be easy to find, where's the problem.
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay
User avatar
Bender
Member
Member
Posts: 449
Joined: Wed Aug 21, 2013 3:53 am
Libera.chat IRC: bender|
Location: Asia, Singapore

Re: Probably the worst bug to debug

Post by Bender »

Roman wrote:Isn't GDB able to trace the call stack? If so, it would be easy to find, where's the problem.
The backtrace told me that it was "exit()" that was causing the fault, but it's highly unlikely that glibc would have a bug, since it's got a ton of programs using it, and even if there is, a bug in exit, a function used by every C program out there, impossible.
The real bug was in "free(fileptr)" -- but doing that is (by language) perfectly legal, although, undefined, since you're supposed to use fclose for file streams.
"In a time of universal deceit - telling the truth is a revolutionary act." -- George Orwell
(R3X Runtime VM)(CHIP8 Interpreter OS)
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Probably the worst bug to debug

Post by Combuster »

Bender wrote:doing that is (by language) perfectly legal, although, undefined
Actually, the language specification states under free:
C standard wrote:ptr - Pointer to a memory block previously allocated with malloc, calloc or realloc.
So passing something you obtained from fopen() is not a valid parameter, and thus not legal.

That is apart from the fact that undefined behaviour is not to be considered legal in the first place.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
KemyLand
Member
Member
Posts: 213
Joined: Mon Jun 16, 2014 5:33 pm
Location: Costa Rica

Re: Probably the worst bug to debug

Post by KemyLand »

Combuster wrote:
Bender wrote:doing that is (by language) perfectly legal, although, undefined
Actually, the language specification states under free:
C standard wrote:ptr - Pointer to a memory block previously allocated with malloc, calloc or realloc.
So passing something you obtained from fopen() is not a valid parameter, and thus not legal.
There are some functions that return legally free()able pointers, such as strdup().
Happy New Code!
Hello World in Brainfuck :D:

Code: Select all

++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
[/size]
onlyonemac
Member
Member
Posts: 1146
Joined: Sat Mar 01, 2014 2:59 pm

Re: Probably the worst bug to debug

Post by onlyonemac »

KemyLand wrote:There are some functions that return legally free()able pointers, such as strdup().
Furthermore in this case the compiler wouldn't know where the pointer was obtained from.
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
cyx
Posts: 2
Joined: Wed Oct 22, 2014 9:34 pm

Re: Probably the worst bug to debug

Post by cyx »

KemyLand wrote:
Combuster wrote:
Bender wrote:doing that is (by language) perfectly legal, although, undefined
Actually, the language specification states under free:
C standard wrote:ptr - Pointer to a memory block previously allocated with malloc, calloc or realloc.
So passing something you obtained from fopen() is not a valid parameter, and thus not legal.
There are some functions that return legally free()able pointers, such as strdup().
You can make any function return a legally free()able pointer as long as it allocates it with malloc, calloc or realloc ;)
Kevin
Member
Member
Posts: 1071
Joined: Sun Feb 01, 2009 6:11 am
Location: Germany
Contact:

Re: Probably the worst bug to debug

Post by Kevin »

KemyLand wrote:There are some functions that return legally free()able pointers, such as strdup().
strdup() isn't standard C (yet), and POSIX defines free() so that it's legal to pass a "pointer earlier returned by a function in POSIX.1‐2008 that allocates memory as if by malloc()".
Developer of tyndur - community OS of Lowlevel (German)
Post Reply