Page 1 of 1

compilers

Posted: Fri Feb 28, 2014 1:49 am
by icealys
I have alot to learn about compilers...I heard the dragon book was good so when I get a chance to read it i will. Something that came to mind was the different sections that the compiler generates inside the pe header. There is a data section and code section among others...So my question is on a pretty low level. I'm curious as to how the whole code section and data section are implemented when it comes down to the CPU fetching the instructions and data from memory...

Re: compilers

Posted: Fri Feb 28, 2014 2:35 am
by Brendan
Hi,
icealys wrote:I have alot to learn about compilers...I heard the dragon book was good so when I get a chance to read it i will. Something that came to mind was the different sections that the compiler generates inside the pe header. There is a data section and code section among others...So my question is on a pretty low level. I'm curious as to how the whole code section and data section are implemented when it comes down to the CPU fetching the instructions and data from memory...
For code, the CPU typically has some sort of pointer to the next instruction (often called an instruction pointer!), and fetches instructions from wherever the instruction pointer points. Those instructions may reference data in memory, so the CPU fetches data whenever the instructions tell it to. The CPU does not know anything about "sections" - it doesn't care if it happens to be fetching instructions from the ".data" section, or writing data to the ".text" section, or anything else.

However; most CPUs have some way of limiting what sorts of accesses are allowed (e.g. sort of MMU or Memory Management Unit). If the CPU does support something like this, then the OS can tell the CPU which areas can be read, which areas can be written to, and possibly which areas can be executed. To do this, the OS would look at the executable file and figure out the access permissions for different sections and how those sections will correspond to memory areas, and then tell the CPU what that types of accesses are allowed for those memory areas.

Also note that typically a process runs in a virtual (pretend) address space (and not the real/physical address space); and this allows an OS can do some strange and tricky things, which mostly involves deceiving either the executable/process or the CPU or both (for some sort of benefit).

For a simple example; the process might have a ".bss" section that is 1234 MiB, and the OS might lie to the process and pretend this memory area does exist and was allocated, and then tell the CPU not to allow writes to this area. Then, if the process tries to write to the area the CPU will complain, and the OS can allocate the memory then. That way if a process only uses 3 bytes of that memory you don't end up wasting 1234 MiB of actual RAM for nothing (and may only end up allocating 4 KiB of RAM that was actually used).


Cheers,

Brendan

Re: compilers

Posted: Fri Feb 28, 2014 2:50 am
by bwat
icealys wrote:I have alot to learn about compilers...I heard the dragon book was good so when I get a chance to read it i will.
The dragon book is ok if you're used to the sort of symbol shunting you find in logic/abstract algebra. Otherwise it could be heavy going but well worth the effort. If you haven't got that sort of background and you don't fancy putting in all that extra effort, then there are plenty of good compiler books that might suit you better.

Re: compilers

Posted: Fri Feb 28, 2014 3:58 am
by Antti
icealys wrote:I heard the dragon book was good
In general, I think the book is a little bit overrated. It goes without saying that I appreciate the research done by the authors. If those achievements are put aside and we look at the book itself, today, I think it would be possible to teach to those things in a different way. I am not saying I could write a better book but it seems that some things look more difficult in that book than they really are.
bwat wrote:there are plenty of good compiler books that might suit you better
What is your recommendation?

Re: compilers

Posted: Fri Feb 28, 2014 5:10 am
by bwat
Antti wrote:
bwat wrote:there are plenty of good compiler books that might suit you better
What is your recommendation?
I like books that go from interpreters to compilers. The connection between the two is important.

The one book which I've used to write a compiler which I've then properly used is Lisp in Small Pieces by Christian Queinnec. If you know Scheme/Lisp then it'll get you writing a compiler in no time. The chapters are:
1) Writing an interpreter for Scheme/Lisp.
2) Name spaces and recursion.
3) Exceptions and call-with-current-continuation.
4) Assignments and quotations.
5) Denotational semantics. This is the theory part but it's explained nicely.
6) Making the interpreter faster.
7) Compiling to byte code.
8) Reflection. This chapter covers things most programmers don't know are possible. The reflective interpreter is cool.
9) Macros.
10) Compiling to C. I never bothered implementing this part as the bytecode compiled system is fast enough.
11) An object system. I never bothered with implementing this part either.

A similar book is Programming Languages, An Interpreter-Based Approach by Samuel N. Kamin. Basically, the book gives Pascal code for 8 different languages. The languages are the ones that were hot in the late 80s and early 90s (it's an old book). The languages in the book are much simpler than the real languages but you get a taste of what the real ones are like. The chapters are
1) A basic evaluator, i.e. an interpreter that deals with integers and arithmetic and boolean operations. Simple function definition is possible.
2) LISP. This is Chapter 1 with lists.
3) APL. This is Chapter 1 with arrays.
4) Scheme. This is Chapter 2 with functions as first class values.
5) SASL. A lazy version of Chapter 4.
6) CLU. This is Chapter 1 with clusters (a kind of data type definition).
7) Smalltalk. This is Chapter 7 with classes.
8) Prolog. This is the most different chapter, basically simple SLD resolution.
9) Compilation to bytecode. Translation from Chapters 1 and 4, to a stack machine is shown.
10) Memory Management. This covers simple garbage collection (mark-scan, semi-space, reference counting).

The book Abstract Computing Machines, A Lambda Calculus Perspective by Werner Kluge is another great book but as the name suggests it uses the lambda calculus as the programming language to be interpreted/compiled and that might put people off. Personally, I loved the book.

Re: compilers

Posted: Fri Feb 28, 2014 1:12 pm
by icealys
so for the different sections, they would be located in different areas of memory? If the cpu does a fetch on an instruction it would look in another location for the data? lets say if you had an instruction like add eax,4, then it would get the 4 from the data section in some other address? Or would that count as 1 instruction in a single address?

Re: compilers

Posted: Fri Feb 28, 2014 2:01 pm
by Antti
The instruction "add eax, 4" contains the number 4 within the instruction itself. It is an immediate value (you will find this term in CPU manuals). The instruction itself does not access any memory location when executed. A better example would be to use "add eax, [4]". This accesses the memory location 4 and adds the value found on that location to the register eax.

Code: Select all

Memory address		Content
0x00000000:		0x00000000
0x00000004:		0x12345678
0x00000008:		0x00000000
0x0000000C:		0x00000000
...
...
0x00001000:		"add eax, [4]"
If a CPU is about to start executing the example memory layout, its instruction pointer contains 0x00001000. Nothing is executed yet. The next instruction CPU is going to execute is at 0x00001000 ("add eax, [4]") and it fetches it. This fetching is accessing the memory so the address 0x00001000 needs to be accessible (read & execute). Then the instruction "add" itself access the memory address 0x00000004 because it needs to get the value from there (0x12345678) and add it to the register eax. The memory address 0x00000004 needs to be accessible (read).

Re: compilers

Posted: Fri Feb 28, 2014 2:20 pm
by icealys
what would be in the data section then ? can you give an example of assembly instructions accessing both the data section and the instruction section?

Re: compilers

Posted: Fri Feb 28, 2014 2:32 pm
by Gigasoft
Do you need help with reading the post you just replied to?

Re: compilers

Posted: Fri Feb 28, 2014 2:34 pm
by Antti
icealys wrote:what would be in the data section then ?
Do you really meant what would be in it? In this case, there was the value 0x12345678. If you meant where the data section was, then the memory range from 0 to 0x0FFF could have been the data section and the memory range 0x1000 to 0x2000 could have been the code section.
icealys wrote:can you give an example of assembly instructions accessing both the data section and the instruction section?
Data and instructions do not differ from each other. It is all about how you interpret them. An extremely bad example would have been "add eax, [0x0FFE]". It would have accessed both sections (two bytes from both).

Re: compilers

Posted: Fri Feb 28, 2014 2:49 pm
by icealys
i'm trying to understand the difference between data section, stack section, and heap section. They are all used to store data. The stack is used to store parameters and local variables and is more temporary. The heap is used for global variables and more permanent storage and dynamic allocation. Where would the data section fit in with these?

Re: compilers

Posted: Fri Feb 28, 2014 3:02 pm
by Pancakes
I sometimes get .bss and .data confused, but this page basically explains it.

http://en.wikipedia.org/wiki/Data_segment

The .data segment contains initialized global data. Not stack allocated (inside function). Like:

Code: Select all

uint32 globaldatavar = 44;    /* should be .data */
uint32 globalbssvar;             /* should be .bss */
/* function CPU instructions located in .text section */
uint32 function(uint32 argvar /* should be also on stack as argument or in register */) {
    uint32     stackvar;          /* should be stack */
    uint32     *heapvar;
    heapvar = (uint32*)malloc(sizeof(uint32) * 10);    /* should be heap */
    return 0x55; /* normally returned in single register if there is a return value */
}
I dont know what I did wrong.. but I found a symbol that should have been in the .data section in my .text section once! Do not ask me because the section was declared READONLY yet I was writing a value to it. But, I think the above I got about right.

The reason .bss is used is because it only has to be zero initialized so IIRC it does not have to actually exist in the object/executable file. Imagine 4096 bytes of zeros.

All the CPU instructions for that function would be located in the .text section.

If your have access to an utility like objdump use that to dump the object/executable file. I am pretty sure visual studio has a tool with the same name that will dump a PE32/PE.. whatever file. Then you can see where each symbol is in each section.

Re: compilers

Posted: Fri Feb 28, 2014 3:22 pm
by icealys
alright thx for your replies.