Page 3 of 4

Re: please help AHHHH ld and gcc problem ?

Posted: Fri Jan 16, 2009 2:59 pm
by ru2aqare
Sam111 wrote:
Assuming I have a PE and search it to find that .data , .text begain at

Code: Select all

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00008730  000010d0  000010d0  000008d0  2**4
                  CONTENTS, ALLOC, LOAD, CODE
  1 .data         00000e00  00009800  00009800  00009000  2**4
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00003400  0000a600  0000a600  00000000  2**2
                  ALLOC
  3 .comment      00000010  0000da00  0000da00  00009e00  2**2
                  CONTENTS, DEBUGGING
So .text begins at 000010d0.
No, it doesn't. It starts at file offset 0x8D0 and is 0x8730 bytes in length. When the load loads this PE file, it allocates a memory range large enough to hold the entire image (all sections together). Let's assume this memory range starts at address 1M. It starts loading the sections one by one at this address (actually, Windows places the PE headers first at this address, so the first section won't be loaded at 1M, but 1M+256 bytes or whatever. But let's ignore this). This address is the base address. The sections will be loaded at base address + LMA, and if paging is used, they will appear at base address + VMA.
If you mean it's virtual address (the address the application "sees"), then it starts at load address + 0x10d0.

However, if you consider an object file (which is NOT a PE file as you wrote), the linker combines all code, data and rdata sections to one big code, data and rdata section respectively (all code segments from all object files will be combined into one large code segment. All data segments from all object files will be combined into one large data segment, and so on). Then the linker calculates the addresses of the symbols, and resolves references to the symbols. But I already wrote that.
Sam111 wrote: Also If I want to load it to some other memory address say 00003456. I would first have to find the entry point symbol in the symbol table then find it's memory size then update it with 00003456. And then we would have the next symbol at 00003456 + size of starting entry = next need to be updated symbol.
You set the load address to 0x3456 and recalculate the base addresses of each section, then copy the sections there and perform base relocations (this is actually the step that gathers most of the hate towards PE). If that wasn't the answer to your question, then I didn't understand what you wrote.
Sam111 wrote: Don't know what all this crap means below

Code: Select all

[731](sec  1)(fl 0x00)(ty   0)(scl   3) (nx 1)... important stuff like start address symbol name...
I would guess symbol 731 is defined in section 1 (-1 means external if I remember correctly), which has flags 0, is at the specified address and has the specified name.

Re: please help AHHHH ld and gcc problem ?

Posted: Sat Jan 17, 2009 7:49 pm
by Sam111
No, it doesn't. It starts at file offset 0x8D0 and is 0x8730 bytes in length. When the load loads this PE file, it allocates a memory range large enough to hold the entire image (all sections together). Let's assume this memory range starts at address 1M. It starts loading the sections one by one at this address (actually, Windows places the PE headers first at this address, so the first section won't be loaded at 1M, but 1M+256 bytes or whatever. But let's ignore this). This address is the base address. The sections will be loaded at base address + LMA, and if paging is used, they will appear at base address + VMA.
If you mean it's virtual address (the address the application "sees"), then it starts at load address + 0x10d0.

However, if you consider an object file (which is NOT a PE file as you wrote), the linker combines all code, data and rdata sections to one big code, data and rdata section respectively (all code segments from all object files will be combined into one large code segment. All data segments from all object files will be combined into one large data segment, and so on). Then the linker calculates the addresses of the symbols, and resolves references to the symbols. But I already wrote that.

Ok, I guess I don't fully understand maybe it's me.
But when I disassembly the code I get the top line of code 000010d0 <start> so the address is at 000010d0.
So I assume you need to put the <start> code starting at 000010d0.

It is the same question as if you had the same exact assembly code but org 0 for one of them and org 0x7C0 for the other.
And you where loading the code into 0000:07C0. The one that had org 0 won't work because the address's are started off relative to 0000:0000 mov ax, bx (first instruction in code) . As opposed to 0000:07c0 so if I had
0000:07C0 jmp address
0000:07C2 db myvarable 10.
address: mov ax , myvarable
Try creating a com file with org something other then 100h you won't get it to work. At least I know whenever I write a bootloader I have to do org 0x7C0 to get it to work or at least do when I need to display a string using int 13h I need to point [ds:dx] to myvarable by mov dx , 0x7c2 then calling int 13h. But if I use org 0x7C0 I don't have this problem when I do mov dx , myvarable. Because When nasm assemblies it it assumes the varables or code addresses start of at org directive.

Now Back to changing a PE into a bin. I copy the .text and .data section out of the PE. But start is at 000010d0 when disassemblied I would think this means that it is like org 000010d0. So I would have to load it at that memory address and the jmp 000010d0 .
file offset 0x8D0 and is 0x8730 bytes in length
Yes I know this but that is just where it is located in the PE.
When the loader load's it I would think it has to be placed exactly at LMA. Or then you run into the same org problem I had with my bootloader program. Unless of course their is an easy way to shift all the address's acordingly?

What I don't get is how you recalcuate the address from the symbol table. Since the .text section is just numbers how do you know from the update symbol table what need's to be updated. Like say you have call 0x5678 in the .text section how do you know if it was 0x5678 in the symbol table or just a fix address that is not in the symbol table but needs to be updated with 0x5678 + (new memory address - old starting memory address ).

I guess I just don't know how to traverse thru the .text section (code ) and update it with the correct info in the symbol table? I get how to recalculate the symbol's in the symbol table I just don't get how you update the code accordingly.

In theory if I load the code section followed by the data section into memory 000010d0 and jump to it it would work. I believe but not sure. If I wanted to load the extracted code section and data section into memory other then 000010d0.
I would be equivalently changing org 000010d0 to a new memory starting address.

I hope you get what I am getting at. Basically how do you change the code starting at some address to the code starting at another address. Without screwing up the code inside the functions etc etc...
You would have to some how update everything with the orginal address + difference of orginal address and new address.
I don't get how all the symbols can be use to update the .text (code) . Would you just need to look for the address of the symbol in the .text section and replace it with the updated version in the symbol table? I am still unsure if you had a
jmp <functionstartaddress + 25> in the funtion you would also have to update this instruction but it isn't a symbol in the symbol table.

[-o<

Re: please help AHHHH ld and gcc problem ?

Posted: Sun Jan 18, 2009 1:59 am
by ru2aqare
Sam111 wrote: Ok, I guess I don't fully understand maybe it's me.
But when I disassembly the code I get the top line of code 000010d0 <start> so the address is at 000010d0.
So I assume you need to put the <start> code starting at 000010d0.
Oh, sorry, it seems I misunderstood the question.
Sam111 wrote: Now Back to changing a PE into a bin. I copy the .text and .data section out of the PE. But start is at 000010d0 when disassemblied I would think this means that it is like org 000010d0. So I would have to load it at that memory address and the jmp 000010d0 .
file offset 0x8D0 and is 0x8730 bytes in length
Yes I know this but that is just where it is located in the PE.
When the loader load's it I would think it has to be placed exactly at LMA. Or then you run into the same org problem I had with my bootloader program. Unless of course their is an easy way to shift all the address's acordingly?
Sam111 wrote: I hope you get what I am getting at. Basically how do you change the code starting at some address to the code starting at another address. Without screwing up the code inside the functions etc etc...
There is an easy way to do that. Just load the PE file, and perform a base relocation. That is, calculate the address you loaded the file at (0x10d0 or whatever), get the preferred load address from the PE file header (let's assume it says 0x1000, but almost always it is 64K aligned). Then you get the difference (0xd0) which is the amount you have to subtract from every base relocation to get the executable to run correctly. The thing is, every time you write

Code: Select all

mov eax, offset some_variable
the assembler emits a relocation into the object file, saying that "this line references the address of some_variable with a displacement of zero". If you wrote

Code: Select all

mov eax, offset some_variable + 1234h
then the assembler would emit a relocation into the object file, saying that "this line references the address of some_variable with a displacement of 0x1234". When you link the object files together to get the PE file, the linker resolves these references, and emits a base relocation into the base relocation table, saying "there is an instruction at address X that references a variable relative to the start of the load address", and the offset of the variable from the preferred load address is in the instruction, so the loader only has to add the difference of the actual load address from the preferred load address to make the instruction refer to the correct memory location. The advantage of this system is that if you can load the PE file at it's preferred load address into virtual memory, there is no need to perform relocations - and the disadvantage is that if you somehow can't load it to the preferred load address, there may be a lot of relocations to perform.

So, what you have to do is this:

Code: Select all

load the PE file, with respect to section alignment, into virtual memory at some address.
uint load_address = <address where you start loading the entire PE file at>
uint preferred_load_address = <get it from the PE headers>
uint delta = load_address - preferred_load_address;
uint base_reloc_table_start = <RVA from sixth data directory entry>
uint base_reloc_table_length = <length from sixth data directory entry>
for (uint8* p = load_address + base_reloc_table_start, uint8* q = p + base_reloc_table_length; p < q; )
{
  uint block_page_address = *(uint*)p; p+= 4;
  uint block_length = *(uint*)p; p += 4; block_length -= 8;
  // process relocations for a page
  for (uint16* reloc = (uint16*)p, p += block_length; block_length > 0; block_length -= 2)
  {
    uint16 type = <high 4 bits of reloc>
    uint16 offset = <low 12 bits of reloc>
    if (type == 3) // may be 10 for x64
    {
      uint* address = load_address + block_page_address + offset;
      *address += delta; // +=, to keep whatever displacement the instruction is using.
    }
    else if (type == 0)
    {
      // ignore this
    }
    else error();
  }
}
One thing to watch out for is that all addresses inside the PE file are RVA's, Relative Virtual Addresses (the only exception is the resource table, but it can be ignored here). Basically the RVA is an offset from whatever address the file is loaded at, taking section alignment into consideration. Let's say you have a file with three sections.

Code: Select all

               size    offset in file RVA in memory
headers        0xd0    0x0            0x0
section 1      0x10    0x0 + 0xd0     0x0 + page_align(0xd0) = 0x1000
section 2      0x20    0xd0 + 0x10    0x1000 + page_align(0x10) = 0x2000
section 3      0x18    0xe0 + 0x18    0x2000 + page_align(0x20) = 0x3000
If some variable is located in section 2, it has a RVA of <0x2000 + offset into section 2>, although it resides at address <0xe0 + offset into section 2> in the file. So if you load the PE file at load_address 0x1000, you have to move section 1 to load_address+0x1000, and section 2 to load_address+0x2000, and so on.
Sam111 wrote: What I don't get is how you recalcuate the address from the symbol table. Since the .text section is just numbers how do you know from the update symbol table what need's to be updated. Like say you have call 0x5678 in the .text section how do you know if it was 0x5678 in the symbol table or just a fix address that is not in the symbol table but needs to be updated with 0x5678 + (new memory address - old starting memory address ).
While theoretically a PE file may have a symbol table, it often is missing (ld on Cygwin sometimes leaves the symbol table in the PE file, but other linkers remove it from the image). So you can't use the symbol table to update the relocations.

This is all done by performing base relocation.

Re: please help AHHHH ld and gcc problem ?

Posted: Sun Jan 18, 2009 2:18 pm
by Sam111
Ok I am understanding this a little better but I still have a few questions about base relocation.

Basically the structure of a PE is this

Code: Select all



public class DOS_HEADER {
                                  // DOS .EXE header
    public short e_magic;         // Magic number
    public short e_cblp;          // Bytes on last page of file
    public short e_cp;            // Pages in file
    public short e_crlc;          // Relocations
    public short e_cparhdr;       // Size of header in paragraphs
    public short e_minalloc;      // Minimum extra paragraphs needed
    public short e_maxalloc;      // Maximum extra paragraphs needed
    public short e_ss;            // Initial (relative) SS value
    public short e_sp;            // Initial SP value
    public short e_csum;          // Checksum
    public short e_ip;            // Initial IP value
    public short e_cs;            // Initial (relative) CS value
    public short e_lfarlc;        // File address of relocation table
    public short e_ovno;          // Overlay number
    public short e_res[4];        // Reserved words
    public short e_oemid;         // OEM identifier (for e_oeminfo)
    public short e_oeminfo;       // OEM information; e_oemid specific
    public short e_res2[10];      // Reserved words
    public long  e_lfanew;        // File address of new exe header
} 





public class PE_HEADER {
    public long   pe_magic;           // Magic number
    public short  Machine;           // Machine number
    public short  NumberOfSections;  // 
    public long   TimeDateStamp;     //
    public long   PointerToSymbolTable;
    public long   NumberOfSymbols;
    public short  SizeOfOptionalHeader;
    public short  Characteristics;



}


public class PE_OPTIONAL_HEADER {

    // Standard fields.
    //
    public short  Magic;  
    public byte   MajorLinkerVersion;
    public byte   MinorLinkerVersion;
    public long   SizeOfCode;
    public long   SizeOfInitializedData;
    public long   SizeOfUninitializedData;
    public long   AddressOfEntryPoint;
    public long   BaseOfCode;
    public long   BaseOfData;
    //
    // NT additional fields.
    //
    public long   ImageBase;
    public long   SectionAlignment;
    public long   FileAlignment;
    public short  MajorOperatingSystemVersion;
    public short  MinorOperatingSystemVersion;
    public short  MajorImageVersion;
    public short  MinorImageVersion;
    public short  MajorSubsystemVersion;
    public short  MinorSubsystemVersion;
    public long   Reserved1;
    public long   SizeOfImage;
    public long   SizeOfHeaders;
    public long   CheckSum;
    public short  Subsystem;
    public short  DllCharacteristics;
    public long   SizeOfStackReserve;
    public long   SizeOfStackCommit;
    public long   SizeOfHeapReserve;
    public long   SizeOfHeapCommit;
    public long   LoaderFlags;
    public long   NumberOfRvaAndSizes;
    DATA_DIRECTORY DataDirectory[NUMBEROF_DIRECTORY_ENTRIES];




}


public class DATA_DIRECTORY {
    ULONG   VirtualAddress;
    ULONG   Size;
} 



public class SECTION_HEADER {

    UCHAR   Name[IMAGE_SIZEOF_SHORT_NAME];
    union {
            ULONG   PhysicalAddress;
            ULONG   VirtualSize;
    } Misc;
    ULONG   VirtualAddress;
    ULONG   SizeOfRawData;
    ULONG   PointerToRawData;
    ULONG   PointerToRelocations;
    ULONG   PointerToLinenumbers;
    USHORT  NumberOfRelocations;
    USHORT  NumberOfLinenumbers;
    ULONG   Characteristics;
} 

Then the sections that's it (Don't really get where the relocation table goes ?)
My thing is where is the relocation table located in (i.e .text , .data ,...etc ) and what is it's structure?
I would think the relocation table cann't be in the raw data sections like .text or .data so what section is it usually in ?
We have the structure of a PE as

Code: Select all

MS DOS Header
MS DOS STUB PROGRAM
PE HEADER 
PE OPTIONAL HEADER  (Which contains pointers to the data directories ) 
SECTION HEADERS (which contains pointers to the sections and a pointer to the relocation table I think? )
Sections (.text .data , .idate ,edate , .rsrc ...etc I think their are 9 predefine types but whatever)


I am confused about where the relocation table is located in. And I am also confused where the data directories are located in. I thought the sections contain the raw data no data directories ...etc etc? I know how to get the data directories from the optional pe header.

Also what is the structure of this relocation table??? It must have an entry for every line in the .text section that uses a variable. Like your example mov eax, offset some_variable. What would it look like in the relocation table for some_variable.

In the section header we have these I would think this data can be used to find the start of the raw data and relocation table.
ULONG SizeOfRawData;
ULONG PointerToRawData;
ULONG PointerToRelocations;
ULONG PointerToLinenumbers;
USHORT NumberOfRelocations;
USHORT NumberOfLinenumbers;

This implies that .text doesn't only contain the raw data but also contain the relocation table for .text as well?
I was under the impression that .text section was just the raw data unless of course The pointer to relocations is pointing to a different section then the .text section?

But I am still confused on what the virtual address , logical address , relative virtual address is? What are the differences and why do you need them all?

And does anybody know what pointer to line numbers entry is for?

Maybe the relocation tables go into their own section called .reloc forever section .data .text ,..etc
But the structure is still unclear.

And when we talk about base address this address is the starting memory address that the first byte's MZ are loaded at.
Their is also a field ImageBase ,...etc I don't get why we need so many different address's.

Re: please help AHHHH ld and gcc problem ?

Posted: Sun Jan 18, 2009 3:01 pm
by ru2aqare
Sam111 wrote:Ok I am understanding this a little better but I still have a few questions about base relocation.

Basically the structure of a PE is this

Code: Select all

snip snip snip
public class SECTION_HEADER {

    UCHAR   Name[IMAGE_SIZEOF_SHORT_NAME];
    union {
            ULONG   PhysicalAddress;
            ULONG   VirtualSize;
    } Misc;
snip snip snip
}
I think that's incorrect. Here is a relevant definition from my kernel:

Code: Select all

/// COFF section header type.
typedef struct SectionHeader
{
    /// Name of section.
    CharA   SectionName[8];
    
    /// Size of section in memory.
    ///
    /// Should not be less than RawSize for all except uninitialized sections.
    uint32  LoadSize;
    
    /// RVA of section in memory.
    uint32  LoadAddress;
    
    /// Size of section in file.
    uint32  RawSize;
    
    /// RVA of section in file.
    uint32  RawAddress;
    
    /// RVA of relocations for section.
    uint32  AddrRelocations;
    
    /// RVA of line numbers for section.
    uint32  AddrLineNumbers;
    
    /// Count of relocation records.
    uint16  NumRelocations;
    /// Count of line number records.
    uint16  NumLineNumbers;
    
    /// Section flags.
    CoffSectionFlags Flags;
    
} SectionHeader, *pSectionHeader;
typedef const SectionHeader *pcSectionHeader;
#define SectionHeader_ (sizeof(SectionHeader))
Sam111 wrote: My thing is where is the relocation table located in (i.e .text , .data ,...etc ) and what is it's structure?
I would think the relocation table cann't be in the raw data sections like .text or .data so what section is it usually in ?
It is usually - but not necessarily - in the '.reloc' section. The specification states that to find the base relocation table, you have to use the appropriate data directory entry.
Sam111 wrote: I am confused about where the relocation table is located in. And I am also confused where the data directories are located in. I thought the sections contain the raw data no data directories ...etc etc? I know how to get the data directories from the optional pe header.
Check your definitions. Hint:

Code: Select all

public class PE_OPTIONAL_HEADER
{
snip snip snip
    DATA_DIRECTORY DataDirectory[NUMBEROF_DIRECTORY_ENTRIES];
}
Sam111 wrote: Also what is the structure of this relocation table??? It must have an entry for every line in the .text section that uses a variable. Like your example mov eax, offset some_variable. What would it look like in the relocation table for some_variable.
It doesn't have a fixed structure. It uses variable-length arrays, that's why most header files don't give a definition for it.
In pseudocode it could be defined as

Code: Select all

typedef struct BaseRelocationTableForPage
{
    uint32 PageRVA;
    uint32 StructureLength; // implies count of items in the array.
    struct Entry
    {
        uint16 Offset:12;
        uint16 Type:4
    } Entries[as many as you need];
} BaseRelocationTableForPage, *pBaseRelocationTableForPage;

typedef struct BaseRelocationTable
{
  BaseRelocationTableForPage Pages[as many as you need];
} BaseRelocationTable, *pBaseRelocationTable;
Sam111 wrote: In the section header we have these I would think this data can be used to find the start of the raw data and relocation table.
ULONG SizeOfRawData;
ULONG PointerToRawData;
ULONG PointerToRelocations;
ULONG PointerToLinenumbers;
USHORT NumberOfRelocations;
USHORT NumberOfLinenumbers;

This implies that .text doesn't only contain the raw data but also contain the relocation table for .text as well?
I was under the impression that .text section was just the raw data unless of course The pointer to relocations is pointing to a different section then the .text section?
Nope, the last two fields are used in COFF object files only. I haven't seen any PE image that made use of these fields. They would point to the COFF relocation and line number table. Since most linkers strip the symbol table out of the final PE image, these fields are set to zero.
Sam111 wrote: But I am still confused on what the virtual address , logical address , relative virtual address is? What are the differences and why do you need them all?
Virtual address: an address into the virtual memory (duh).
Logical address: I haven't encountered this term in the specification. Probably it means the virtual address.
RVA: The difference of a virtual address and the load address.

Re: please help AHHHH ld and gcc problem ?

Posted: Sun Jan 18, 2009 7:30 pm
by Sam111
Ok, I can find the relocation table from the optional header .
And the table is just an array of structures (your structures BaseRelocationTableForPage )

Then if I needed to do relocation what would I change in this table and how do you know what entry in the table corresponse to in the .text section?

Like say you had a statement in the .text section
like this

Code: Select all

mov ax , myvar
In the relocation table what would it be stored like and how from this info can you figure out what line in the .text file this entry is associated to? I don't see any way you know the exactly instruction in the .text section that need's to be changed From the Relocation Table?

Code: Select all

typedef struct BaseRelocationTableForPage
{
    uint32 PageRVA;
    uint32 StructureLength; // implies count of items in the array.
    struct Entry
    {
        uint16 Offset:12;
        uint16 Type:4
    } Entries[as many as you need];
} BaseRelocationTableForPage, *pBaseRelocationTableForPage;
//How the hell did you know this it's not in an spec's I have seen?
//And For the inner Entry structure what is the Type:4 and Offset:12 ?
//For Page <---what is a page corrospond to ? Is the PageRVA have anything to do with finding the instruction that need's
//to be changed.

Re: please help AHHHH ld and gcc problem ?

Posted: Sun Jan 18, 2009 7:55 pm
by ru2aqare
Sam111 wrote: Then if I needed to do relocation what would I change in this table and how do you know what entry in the table corresponse to in the .text section?
You don't need to change anything in the base relocation table. It just contains RVAs into the loaded image where relocations need to be performed. And you don't need to know what section the relocation is in, either. You could do a reverse lookup with the help of the section headers, but it's not relevant information.
Sam111 wrote: Like say you had a statement in the .text section
like this

Code: Select all

mov ax , myvar
In the relocation table what would it be stored like and how from this info can you figure out what line in the .text file this entry is associated to? I don't see any way you know the exactly instruction in the .text section that need's to be changed From the Relocation Table?
Nope, you know everything you need to perform the relocation. First, let's assume you have one section (the .text section) in your PE file. Let's further assume that the linker has assigned a RVA of 0x1000 to this section, and file address is 0x200. Let's assume this is the only instruction in your source file. Therefore, when you compile the assembly file, this instruction is at offset zero, which becomes RVA 0x1000 (or offset 0x1000) in the loaded image (and offset 0x200 in the file). Now, this instruction looks something like this:

Code: Select all

66 B8 00 00
The 66 is an operand size override prefix (you are loading data into a 16-bit register in 32-bit mode). The B8 is the opcode for "mov ax, immediate" or "mov eax, immediate" depending on the processor mode. The double zeroes are the offset of myvar. It doesn't need to be zeroes, it could be anything. We don't care anyway what the actual value is - the assembler will put the offset of the variable there (measured from the start of the current segment, as if the current segment were the only thing in the PE file). So the assembler will emit a relocation into the object file which says "there is a relocation at offset 0x02". It's 0x02 because the loader doesn't know about instructions and whatnot. The linker will transform this into a base relocation record, modify the value found at the relocation (add the RVA of the section to the value) and the build the base relocation table, which will look like this:

Code: Select all

00 10 00 00   (the value 0x00001000)
0C 00 00 00   (the value 12)
02 30
00 00
The first doubleword is 0x1000, which is the page RVA to which this relocation block applies. This happens to be the RVA of the .text section - no wonder, since the section has a relocation. The next value is 12, which is 8+4, 8 is the size of the block header and 4 is because there are two entries. The first entry is 0x3002, which can be separated into type 3 (upper 4 bits) and offset 0x0002 (lower 12 bits). Adding the offset to the page RVA yields 0x1002, which is the address you need to modify a doubleword. (The file address would be 0x0202 using similar logic: 0x0002 + 0x0200). The second entry is zero: type zero, offset zero. It is a padding to keep the block size divisible by four, and can safely be ignored.

So the assembler outputs offset_of_variable_from_start_of_section. The linker adds offset_of_section_from_load_address. Then you (or the loader) adds load_address to this value. The entire thing becomes the virtual address of the variable. I don't know how to explain it further.

To perform the relocation, you need to take the difference of the actual load address and the preferred address, and add it to the doubleword found at the address (0x1002) which was computed previously. Since this instruction has a 16-bit immediate, but the relocation is always performed on doubleword values, this can easily overwrite part of the next instruction, but that's another problem.
Sam111 wrote: How the hell did you know this it's not in an spec's I have seen?
And For the inner Entry structure what is the Type:4 and Offset:12 ?
It is actually based on the specification, though it does not appear anywhere in this form (my kernel sources have something similar though). This is why I said that since the relocation table uses variable-length arrays, most C sources don't bother creating a type with it, but just use pointer magic.
The :4 and :12 parts in the C language specify that from a 16-bit value, you allocate 4 and 12 bits, respectively.

Re: please help AHHHH ld and gcc problem ?

Posted: Mon Jan 19, 2009 1:44 am
by Sam111
Ok I think I got how it works.
For base relocation you do (actual base address - prefered base address) + RVA of section + offset in section.
If actual = prefered base address then no relocations need to be done. Otherwise the difference must be add to every address corosponding to the entries in the relocation table.

Last Question's
The higher 4 bits indicate the type of relocation, and the lower 12 bits are the offset of the fixup location within the 4K page. So each relocation table entry is only good for 4K. So if you had a .text section that was 8k long. You would need 2 table entries to contain it? Assuming the first table entry RVA is at 0x1000 (start of text section ). The other entry one would have an RVA of 0x1000 + 4K. So the RVA for the entry of the relocation table does not have to be at the start of a section it could be completely arbitrary depending on where the first relocation needs to be done. I don't think their has to be any order to the entries in the table as well

Anyway what are the different type of relocation's the 4 bit's suggest that their are as many as 16 different types?
(first 4 bit's ) One of them 0000 must be for padding like you explained?


Ok so if I did extract the .text , and .data sections from a PE then all I would have to do is use the relocation table to patch the instructions in the .text and I could use this as a regular bin file (assuming of course we had no import libs ,...etc).
For Example if the prefered load address for the PE was 0x1000 but I extracted the .text and .data sections. And wanted to have my bootloader load this bin into memory 0x2000 then all I would have to do is do relocation using the relocation table in the PE that I extracted the sections from. Then I would be all set? Correct?

Since this instruction has a 16-bit immediate, but the relocation is always performed on doubleword values, this can easily overwrite part of the next instruction, but that's another problem.
So myvar is assumed to be 32 bit address. But I would think your quote would only become a problem if you where mix 16 bit address and 32 bit address. But maybe their is more. Either way how do you fix this and can you give me an example of when this would occur???

Re: please help AHHHH ld and gcc problem ?

Posted: Mon Jan 19, 2009 2:44 am
by ru2aqare
Sam111 wrote:Ok I think I got how it works.
For base relocation you do (actual base address - prefered base address) + RVA of section + offset in section.
If actual = prefered base address then no relocations need to be done. Otherwise the difference must be add to every address corosponding to the entries in the relocation table.
That's correct.
Sam111 wrote: Last Question's
The higher 4 bits indicate the type of relocation, and the lower 12 bits are the offset of the fixup location within the 4K page. So each relocation table entry is only good for 4K. So if you had a .text section that was 8k long. You would need 2 table entries to contain it? Assuming the first table entry RVA is at 0x1000 (start of text section ). The other entry one would have an RVA of 0x1000 + 4K. So the RVA for the entry of the relocation table does not have to be at the start of a section it could be completely arbitrary depending on where the first relocation needs to be done. I don't think their has to be any order to the entries in the table as well.
Yes, if one block of entries is not sufficient, the linker will generate additional blocks. Each block can span four kilobytes of memory. I don't know whether entries in a block must be in ascending order, however all linkers will sort the entries into ascending order. I also don't remember whether the RVA for the entire block must be page aligned or, not, but then again, all linkers generate one block of relocation entries for each page that has relocations (in other words, the RVA doubleword will always be page aligned).
Sam111 wrote: Anyway what are the different type of relocation's the 4 bit's suggest that their are as many as 16 different types?
(first 4 bit's ) One of them 0000 must be for padding like you explained?
Yes, there are 15+1 different possible relocation types. However on x86 only type 3, and on x64 only type 10 is used. Other architectures have more relocation types. Check the COFF specification if you want to know more.
Sam111 wrote: Ok so if I did extract the .text , and .data sections from a PE then all I would have to do is use the relocation table to patch the instructions in the .text and I could use this as a regular bin file (assuming of course we had no import libs ,...etc).
For Example if the prefered load address for the PE was 0x1000 but I extracted the .text and .data sections. And wanted to have my bootloader load this bin into memory 0x2000 then all I would have to do is do relocation using the relocation table in the PE that I extracted the sections from. Then I would be all set? Correct?
Yes. Also take padding between sections into account. For example, if you have two sections (a .text and a .data section), and the first section is only 1 byte long, the linker will align the start of the next section on a page boundary. There is a field in the PE headers that tells you this alignment value. You can't just copy 1 bytes of .text and append the .data section, you have to maintain this alignment.
Sam111 wrote:So myvar is assumed to be 32 bit address. But I would think your quote would only become a problem if you where mix 16 bit address and 32 bit address. But maybe their is more. Either way how do you fix this and can you give me an example of when this would occur???
Since neither the MS linker, nor the GNU ld linker support 16-bit relocations, this shouldn't happen frequently. However, when I was writing my bootloader, I had several assembly files that referenced each other, and the linkers would always complain of the 16-bit relocations not being supported (I wasn't able to compile the boot loader until I wrote my linker, which supports 16-bit relocations, of course). If you have one big assembly file, this won't be much of a problem, since the assembler resolves relocations within the current segment.

Re: please help AHHHH ld and gcc problem ?

Posted: Mon Jan 19, 2009 1:27 pm
by Sam111
Ok great I get it only have a few last minute questions.

Code: Select all

public class PE_OPTIONAL_HEADER {

    // Standard fields.
    //
    public short  Magic;
    public byte   MajorLinkerVersion;
    public byte   MinorLinkerVersion;
    public long   SizeOfCode;
    public long   SizeOfInitializedData;
    public long   SizeOfUninitializedData;
    public long   AddressOfEntryPoint;
    public long   BaseOfCode;
    public long   BaseOfData;
    //
    // NT additional fields.
    //
    public long   ImageBase;
    public long   SectionAlignment;
    public long   FileAlignment;
    public short  MajorOperatingSystemVersion;
    public short  MinorOperatingSystemVersion;
    public short  MajorImageVersion;
    public short  MinorImageVersion;
    public short  MajorSubsystemVersion;
    public short  MinorSubsystemVersion;
    public long   Reserved1;
    public long   SizeOfImage;
    public long   SizeOfHeaders;
    public long   CheckSum;
    public short  Subsystem;
    public short  DllCharacteristics;
    public long   SizeOfStackReserve;
    public long   SizeOfStackCommit;
    public long   SizeOfHeapReserve;
    public long   SizeOfHeapCommit;
    public long   LoaderFlags;
    public long   NumberOfRvaAndSizes;
    DATA_DIRECTORY DataDirectory[NUMBEROF_DIRECTORY_ENTRIES];




}


public class DATA_DIRECTORY {
    ULONG   VirtualAddress;
    ULONG   Size;
} 



public class SECTION_HEADER {

    UCHAR   Name[IMAGE_SIZEOF_SHORT_NAME];
    union {
            ULONG   PhysicalAddress;
            ULONG   VirtualSize;
    } Misc;
    ULONG   VirtualAddress;
    ULONG   SizeOfRawData;
    ULONG   PointerToRawData;
    ULONG   PointerToRelocations;
    ULONG   PointerToLinenumbers;
    USHORT  NumberOfRelocations;
    USHORT  NumberOfLinenumbers;
    ULONG   Characteristics;
} 



For AddressOfEntryPoint is this the RVA of the starting execution point.
So if we loaded the PE at 0x50000 Then the starting point is 0x50000 + AddressOfEntryPoint or is the
AddressOfEntryPoint the offset from the starting of the file ?

Correct me if I am wrong with the stuff I am about to say.
ImageBase is the prefered address to load the file in memory. If you don't load it at ImageBase then base relocation must be done. Usually the default for ImageBase = 0x00400000

SizeOfImage is this the size of the whole file so if you load the PE at it's prefered address then ImageBase + SizeOfImage
would be the end of the PE file in memory.

I am unsure about these fields SectionAlignment , FileAlignment. Basically is the padding already in the file to align correctly or do these fields tell the loader to insert padding between section's ??? Because if it is the latter then you would have to account for this in the SizeOfInage.

The field BaseOfCode this is the RVA so if you had ImageBase at 0x00400000 then ImageBase + BaseOfCode = start of code section in memory? If this is true then BaseOfCode address can be used for the file offset of the code section.
Simalary with BaseOfData you just don't add the ImageBase to it and you get the file offset?

The field PointerToLinenumbers is this even used and for what ?
Same question but for the symbol table what was this every used for. Was this the old version of what is now called relocation table ? Or was the symbol table for something completely different?

And last but not least
If I wanted to inject code into a PE file and have it execute first then all I would have to do is change the AddressOfEntryPoint to point to my code. Then do a jmp back to the original AddressOfEntryPoint address. But in doing this I would think I would have to add to the relocation table my instruction that depend on variables then adjust the size of code , data ,..etc sections. And make the pointer to the data section point to the old file address plus size of my code ,..etc etc . Then I would have a program that executed my code before the original code.

Also their are like 9 predefined headers .text , .data ,.idata ,...etc but could you just create a new user defined section that jmps to the .text section or is their some security permissions stoping you from this. I know .text is executable , but not rdata is only readable ..etc etc

This last question was a question on how you would go about binding a function into a PE and having it execute first.
I would think alot would have to be modified the relocation table must be updated as well because of the added code to the .text section as well as the address's and pointers in the header's. But After you did this it should work provided we didn't have any import libararies.

The only other thing I would maybe need help with is when you have imports , exports , idata , edata ,pdata.
But I will start another topic if I ever need help with this.

Thanks again

Re: please help AHHHH ld and gcc problem ?

Posted: Mon Jan 19, 2009 2:41 pm
by ru2aqare
Sam111 wrote:Ok great I get it only have a few last minute questions.
For AddressOfEntryPoint is this the RVA of the starting execution point.
So if we loaded the PE at 0x50000 Then the starting point is 0x50000 + AddressOfEntryPoint or is the
AddressOfEntryPoint the offset from the starting of the file ?
It is the RVA of the entry point.
Sam111 wrote: Correct me if I am wrong with the stuff I am about to say.
ImageBase is the prefered address to load the file in memory. If you don't load it at ImageBase then base relocation must be done. Usually the default for ImageBase = 0x00400000
Yes, with the addition that linkers usually set 0x01000000 or 0x10000000 as preferred load address for DLLs. DLLs shipped with Windows have carefully computed base addresses to prevent base relocation from taking place under normal conditions.
Sam111 wrote: SizeOfImage is this the size of the whole file so if you load the PE at it's prefered address then ImageBase + SizeOfImage
would be the end of the PE file in memory.
Yes.
Sam111 wrote: I am unsure about these fields SectionAlignment , FileAlignment. Basically is the padding already in the file to align correctly or do these fields tell the loader to insert padding between section's ??? Because if it is the latter then you would have to account for this in the SizeOfInage.
SizeOfImage already includes any padding. Sections in the file are aligned to FileAlignment boundary (usually 512), while in memory they are aligned to SectionAlignment boundary (usually 4096).
Sam111 wrote: The field BaseOfCode this is the RVA so if you had ImageBase at 0x00400000 then ImageBase + BaseOfCode = start of code section in memory? If this is true then BaseOfCode address can be used for the file offset of the code section.
Simalary with BaseOfData you just don't add the ImageBase to it and you get the file offset?
I never used these fields, but I would expect them to contain a RVA, and so ImageBase + BaseOfCode would point to the start of code in the memory.
Sam111 wrote: The field PointerToLinenumbers is this even used and for what ?
Same question but for the symbol table what was this every used for. Was this the old version of what is now called relocation table ? Or was the symbol table for something completely different?
These fields are valid if the ImageFlags value has RelocationsStripped (bit 0) and LineNumbersStripped (bit 2) cleared, and these fields are nonzero. In practice all linkers clear these values. If these fields would be valid, I would expect the PointerToLineNumbers have the RVA of the line numbers table. For the exact structures used in the line numbers and symbols table, see the COFF specification. These have nothing to do with the base relocation table.
However, in object files these fields are valid, and specify the file offset of the line numbers and symbol tables, respectively.
Sam111 wrote: If I wanted to inject code into a PE file and have it execute first then all I would have to do is change the AddressOfEntryPoint to point to my code. Then do a jmp back to the original AddressOfEntryPoint address. But in doing this I would think I would have to add to the relocation table my instruction that depend on variables then adjust the size of code , data ,..etc sections. And make the pointer to the data section point to the old file address plus size of my code ,..etc etc . Then I would have a program that executed my code before the original code.
Welcome to the world of Win32 viruses :)
Sam111 wrote: Also their are like 9 predefined headers .text , .data ,.idata ,...etc but could you just create a new user defined section that jmps to the .text section or is their some security permissions stoping you from this. I know .text is executable , but not rdata is only readable ..etc etc
I'm not sure what you are getting at here.
Sam111 wrote: This last question was a question on how you would go about binding a function into a PE and having it execute first.
I would think alot would have to be modified the relocation table must be updated as well because of the added code to the .text section as well as the address's and pointers in the header's. But After you did this it should work provided we didn't have any import libararies.
I would add a new .text section (you *can* have multiple code sections, even have multiple ".text" sections, the loader doesn't check the section name - for example the NT kernel file ntoskrnl.exe has lots of code segments, most Windows drivers have at least two), then inject position-independent code in this section (to prevent me from having to mess with the base relocation table), and then update the AddressOfEntryPoint. If the function was really huge, I would compile it into a separate DLL and force this DLL to be loaded (DLL injection stuff). Since the DLL's initialization routine runs before the main executable takes control, I can do whatever I want to the main executable. I haven't been able to get this technique working on Vista yet - Vista just ignores my DLL while in XP it works perfectly.

Re: please help AHHHH ld and gcc problem ?

Posted: Mon Jan 19, 2009 9:31 pm
by Sam111
Ok, gotcha
but if you have position independent code that you are injecting you will still have to change some of the header fields in the PE and Optional header. And add another section header that points to your injected code segment.

I think the only things you would have to change is
In the PE header

Code: Select all

NumberOfSections     ( original + one more for your injection header)
SizeOfOptionalHeader ( original plus 40 bytes which is one more entry to contain your injection header )
In the optional header

Code: Select all

AddressOfEntryPoint (your injection code's RVA)
SizeOfHeaders  ( orginal + 40 bytes because of you injection code section )
SizeOfImage ( original + 40 bytes + whatever the size of your injection code is don't know if you have to do any padding or file alginment that would need to be factored in ? )
Then create the 40 byte header entry in the optional header that points to the injected code .

And your done!

I don't know if you have to update these as well ?

Code: Select all

ULONG   BaseOfCode;
ULONG   BaseOfData
ULONG   SizeOfCode;
But I would assume the BaseOf... ones code could stay the way they where provided the injection section was put after the .text , .data section's. But I think the Size of count is the sum of all the code sections.
So if their was 2 code sections we would have to add them together. So SizeOfCode = old value + size of injection code section.


I think That would be it for what you would have to modify. Though I am a little shakey on if I ever have to update file alignment and section alignment. And if I have to factor them in in computing imagesize ...etc etc.

Now if I am not missing any thing then assume we we need injection code that is position depended.
i.e your using variables ...etc

Is the only thing that has to be add is to update the relocation table with the entries for your injection code section.

So then the procedure would be do everything you did for the independent injection code.
But now I add entries to the relocation table for the dependent stuff in the injection code.
Then set the

Code: Select all

PointerToRelocations;
NumberOfRelocations;
In the injection code section header

I would also think you have to set

Code: Select all

SizeOfImage ( original + 40 bytes + whatever the size of your injection code is don't know if you have to do any padding or file alginment that would need to be factored in ? )
PLUS size of relocation table for injection code.

But after that was done then it should work as well.
Please let me know if I am missing something? I do think I got everything except something with alignment issues?

REMEMBER WAY BACK IN THIS POST WHEN I USED OBJDUMP ---->
I do remember trying to create a second code section but it didn't work and when I named it .text it work?
Maybe this was from the fact in my asm I had the section called code and maybe it need's the . in front to be recognized as a vaild section name in the header section. But if you can have to seperate .text sections named differently it certainly need's a dot. Maybe not but on my machine I got my asm code to work only when the code section/segment was named .text. Though I havent tried it with .code . But certainly code with out the dot doesn't work.

Re: please help AHHHH ld and gcc problem ?

Posted: Tue Jan 20, 2009 1:59 am
by ru2aqare
Sam111 wrote:Ok, gotcha
but if you have position independent code that you are injecting you will still have to change some of the header fields in the PE and Optional header. And add another section header that points to your injected code segment.

I think the only things you would have to change is
In the PE header

Code: Select all

NumberOfSections     ( original + one more for your injection header)
SizeOfOptionalHeader ( original plus 40 bytes which is one more entry to contain your injection header )
In the optional header

Code: Select all

AddressOfEntryPoint (your injection code's RVA)
SizeOfHeaders  ( orginal + 40 bytes because of you injection code section )
SizeOfImage ( original + 40 bytes + whatever the size of your injection code is don't know if you have to do any padding or file alginment that would need to be factored in ? )
Then create the 40 byte header entry in the optional header that points to the injected code .

And your done!
Well, if you want to inject code, then yes. There is another thing you can do to eliminate the need of modifying the base relocation table is to allocate your variables on the stack (all variables in one large structure) and carry a pointer to this structure around.
Sam111 wrote: I don't know if you have to update these as well ?

Code: Select all

ULONG   BaseOfCode;
ULONG   BaseOfData
ULONG   SizeOfCode;
But I would assume the BaseOf... ones code could stay the way they where provided the injection section was put after the .text , .data section's. But I think the Size of count is the sum of all the code sections.
So if their was 2 code sections we would have to add them together. So SizeOfCode = old value + size of injection code section.
I don't know whether the loader actually uses any of these values, or are they just for information. But it's better to be safe then sorry, and I would update the SizeOfCode and SizeOfData.
Sam111 wrote: I do remember trying to create a second code section but it didn't work and when I named it .text it work?
Maybe this was from the fact in my asm I had the section called code and maybe it need's the . in front to be recognized as a vaild section name in the header section. But if you can have to seperate .text sections named differently it certainly need's a dot. Maybe not but on my machine I got my asm code to work only when the code section/segment was named .text. Though I havent tried it with .code . But certainly code with out the dot doesn't work.
It doesn't work not because it is not named '.text', but because the section flags don't contain the Executable bit (bit29) and/or the IsCode bit (bit5). The loader doesn't care about section names. For example, many executable compressors rename the sections (to UPX0, UPX1, UPX2 etc in case of UPX), and yet, the compressed executables still can be loaded and run.

Re: please help AHHHH ld and gcc problem ?

Posted: Tue Jan 20, 2009 4:45 pm
by Sam111
Got you on all of that but

Code: Select all

ULONG   SizeOfCode;

So this would imply an exe cann't have more then a 4 gig's section.

So it's like a 4 gig barrier.
Either way 4 gig's would be to much for a 32 bit system anyway.
But cool to note.

Re: please help AHHHH ld and gcc problem ?

Posted: Tue Jan 20, 2009 5:02 pm
by ru2aqare
Sam111 wrote:Got you on all of that but

Code: Select all

ULONG   SizeOfCode;

So this would imply an exe cann't have more then a 4 gig's section.

So it's like a 4 gig barrier.
Either way 4 gig's would be to much for a 32 bit system anyway.
But cool to note.
I'm not suprised by this. Under x86, you have a four gigabyte virtual address space. The operating system kernel takes a chunk out of that, so you are left with usually three gigabytes. So you can't even load what maximum you can specify in that field.

If you are referring to PE+ (or what was it called, the x64 extensions to the PE file format), then yes, the size of the image is still limited to four gigabytes (but you have a much larger virtual address space). While it would be nice to be able to load five or more gigabyte executables into memory, projects that generate that much code are divided into hundreds of smaller DLLs.