share:tips for stabs debugging format
Posted: Fri Apr 25, 2014 12:04 pm
I wrote a samll C debugger for bochs recently, because gdb-stub doesn't work when bochs compiled with SMP. when i was learning about the stabs debugging format, i found nothing but but this document(http://www.sourceware.org/gdb/onlinedocs/stabs.html) on the net. So i want to write some tips as addition to it and some of my experience in parsing the stabs.
1, -gstabs or -gstabs+
gcc supports both compile options and introduces some features for the latter, for examle, the directiory path which the source file locates is specified, this is useful when we implement the back trace of assembly address to source code line number. but i chose to support the former because that document doesn't cover the extended features and i doubt the compatibility of '-gstabs+'.
2, 'typenumber' grows everywhere
when you define a nwe type, such as 'typedef unsigned u32' or 'struct abc{...};', gcc will produce corresponding typenumber for 'u32' and 'abc', and these typenumbers appear in N_LSYM stabs. but be careful, this is only for the case that the typenumber owns a name(namely 'u32','abc'..). for typenumber who has no name, it may appear in a casual type of stab. for example, for code "struct {int a; int b;}ab;", the new typenumber of this anonymous structure will grow in the string field of a N_GSYM stab, like 'ab:G(0,20)=s8a:(0,1),0,32;b:(0,1),32,32;;'. Remeber this point when you are going to collect all the typenumbers.
3, is a N_LSYM stab for the stack variable or for the type symbol?
when we detect a stab whose 'n_type' field is N_LSTM, it may correspond to a stack variable(such as "int i;"), or a type symbol(such as "struct abc{..};"). how to distinguish the two case? of course, parsing the string filed is an approach, but i think a better way is to check whether the 'n_value' field is zero. this method is not documented, but i don't believe any stack variable will sit 0 offset to the stack frame.
4,'void' type is totally self-referred
Typenumbers ranging from (0,1) to (0,18) are builtin. For stabs of builtin types, you can choose to parse their string field or just stop decoding when detected the typenumber is smaller than 19. If you choose the latter case, just ignore this tip. If you treat builtin types as common types. take case that 'void' type refers to itself, like 'void:t(0,18)=(0,18)'. it's neither a normal reference or a normal subrange-type.
5, gcc will produce more than one repeatedly N_GSYM stabs for a global variable, as it's declared or defined in different source file. But, for .symtab section, duplicate global variable symbols are elimited.
6, N_GSYM stabs don't contain the variable's address, but N_STSYM and N_LCSYM stabs include that in 'n_value' field.
7, struct and union may have anonymous memebers.
8, struct and union may have bitfield memebers.
9, the back trace from assembly address to source code line is easy
because the N_SLINE stabs for a function appears continuously, once a different type stab detected, we can regard as the end of that function. And, N_SLINE stab is independent from other concepts like 'block structure' and so on, so we needn't concern too much when perform the back trace.
10, how to handle the N_EXCL stab
when a N_EXCL stab detected, it is the duty of the debugger writter to find out where this header appears for the first time, and, int the later time, relocates all the reference to typenumbers in this EXCL header to it's origin header.
11, when is a struct(or union) member's typenumber nested defined?
for example, "struct foo{struct{int a;}content;};", the struct member <content>'s typenumber will be nested defined, like "foo:T(0,20)=s4content:(0,21)=s4a:(0,1),0,32;;,0,32;;". But if you give the inner structure a type name, like"struct foo{struct foo1{int a;}content;};", then gcc has to use two stabs(namely two string fields)to describle the struct above, because nested definition format can not encode the type's symbol(it has no space to store symbol 'foo1'). now the two string fields look like "foo1:T(0,20)=s4a:(0,1),0,32;;" and "foo:T(0,21)=s4content:(0,20),0,32;;". this tip seems useless, it's just a phenomenon.
my gcc: 4.7.2 ubuntu/linaro
my ld: gnu ld 2.22.90
1, -gstabs or -gstabs+
gcc supports both compile options and introduces some features for the latter, for examle, the directiory path which the source file locates is specified, this is useful when we implement the back trace of assembly address to source code line number. but i chose to support the former because that document doesn't cover the extended features and i doubt the compatibility of '-gstabs+'.
2, 'typenumber' grows everywhere
when you define a nwe type, such as 'typedef unsigned u32' or 'struct abc{...};', gcc will produce corresponding typenumber for 'u32' and 'abc', and these typenumbers appear in N_LSYM stabs. but be careful, this is only for the case that the typenumber owns a name(namely 'u32','abc'..). for typenumber who has no name, it may appear in a casual type of stab. for example, for code "struct {int a; int b;}ab;", the new typenumber of this anonymous structure will grow in the string field of a N_GSYM stab, like 'ab:G(0,20)=s8a:(0,1),0,32;b:(0,1),32,32;;'. Remeber this point when you are going to collect all the typenumbers.
3, is a N_LSYM stab for the stack variable or for the type symbol?
when we detect a stab whose 'n_type' field is N_LSTM, it may correspond to a stack variable(such as "int i;"), or a type symbol(such as "struct abc{..};"). how to distinguish the two case? of course, parsing the string filed is an approach, but i think a better way is to check whether the 'n_value' field is zero. this method is not documented, but i don't believe any stack variable will sit 0 offset to the stack frame.
4,'void' type is totally self-referred
Typenumbers ranging from (0,1) to (0,18) are builtin. For stabs of builtin types, you can choose to parse their string field or just stop decoding when detected the typenumber is smaller than 19. If you choose the latter case, just ignore this tip. If you treat builtin types as common types. take case that 'void' type refers to itself, like 'void:t(0,18)=(0,18)'. it's neither a normal reference or a normal subrange-type.
5, gcc will produce more than one repeatedly N_GSYM stabs for a global variable, as it's declared or defined in different source file. But, for .symtab section, duplicate global variable symbols are elimited.
6, N_GSYM stabs don't contain the variable's address, but N_STSYM and N_LCSYM stabs include that in 'n_value' field.
7, struct and union may have anonymous memebers.
8, struct and union may have bitfield memebers.
9, the back trace from assembly address to source code line is easy
because the N_SLINE stabs for a function appears continuously, once a different type stab detected, we can regard as the end of that function. And, N_SLINE stab is independent from other concepts like 'block structure' and so on, so we needn't concern too much when perform the back trace.
10, how to handle the N_EXCL stab
when a N_EXCL stab detected, it is the duty of the debugger writter to find out where this header appears for the first time, and, int the later time, relocates all the reference to typenumbers in this EXCL header to it's origin header.
11, when is a struct(or union) member's typenumber nested defined?
for example, "struct foo{struct{int a;}content;};", the struct member <content>'s typenumber will be nested defined, like "foo:T(0,20)=s4content:(0,21)=s4a:(0,1),0,32;;,0,32;;". But if you give the inner structure a type name, like"struct foo{struct foo1{int a;}content;};", then gcc has to use two stabs(namely two string fields)to describle the struct above, because nested definition format can not encode the type's symbol(it has no space to store symbol 'foo1'). now the two string fields look like "foo1:T(0,20)=s4a:(0,1),0,32;;" and "foo:T(0,21)=s4content:(0,20),0,32;;". this tip seems useless, it's just a phenomenon.
my gcc: 4.7.2 ubuntu/linaro
my ld: gnu ld 2.22.90