Compiler vs. Assembler -- Which is harder?
Compiler vs. Assembler -- Which is harder?
I don't know because I don't write compilers and assemblers, but:
Which do you think is harder to write? An assembler or compiler?
Which do you think is harder to write? An assembler or compiler?
You are a computer.
~ MCS ~
~ MCS ~
Re: Compiler vs. Assembler -- Which is harder?
Basically, to write an assembler, you need to know what machine instruction stand for what assembly instuction. This too depend upon the machine architecture. For instance, to implement something specific to x86, you've to peek through the Intel's manual and figure how instructions are encoded. Moreover, you've to arrange things in order, the way it was supposed to be. This seems tedious.casnix wrote:I don't know because I don't write compilers and assemblers, but:
Which do you think is harder to write? An assembler or compiler?
On the other hand, if you've ever used gcc -S option, you'd have noticed how compiler arranges things to make programmers life easy.
Conclusion: I'd give 50 points to both. But the fact that compilers don't exist without assemblers give me a strong feeling that writing compiler is somehow troublesome (because it involves writing assembler as well).
Cheers.
Edit: I recalled one of the assembly book which wrote this,
" It took several decades for computer scientists to figure out how to even write a compiler!".
Last edited by Chandra on Sun Mar 27, 2011 10:21 am, edited 1 time in total.
Programming is not about using a language to solve a problem, it's about using logic to find a solution !
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: Compiler vs. Assembler -- Which is harder?
A compiler is definitely harder to write than an assembler. An assembler, in essence, is just a table that maps short strings (instruction mnemonics) to numbers (opcodes); the hardest part of writing an assembler is just creating that table efficiently. A compiler, on the other hand, needs a much more complex parser (depending on the language), and needs to manage stack positions and register usage in arbitrary circumstances. The fact that assembly preceded (widespread) high level languages by many years is due to this.
Re: Compiler vs. Assembler -- Which is harder?
I agree with Nick's explaination of the two, but I have something to add to the compiler part. A good compiler uses advanced techniques to optimize code and eliminate redundancies. http://en.wikipedia.org/wiki/Compiler#C ... nstruction read the "Front End" and "Back End" sections.
Re: Compiler vs. Assembler -- Which is harder?
Hi,
However the good part of being born recently is that most part of compiler writing is formalized to a great degree that it really is not as hard as it used to be . If you can follow the algorithm, you can more or less make it. I agree with Nick, writing an assembler is way easier than writing a compiler. Remember that an assembler is only little more than a simple look up table in some cases.
--Thomas
However the good part of being born recently is that most part of compiler writing is formalized to a great degree that it really is not as hard as it used to be . If you can follow the algorithm, you can more or less make it. I agree with Nick, writing an assembler is way easier than writing a compiler. Remember that an assembler is only little more than a simple look up table in some cases.
--Thomas
Re: Compiler vs. Assembler -- Which is harder?
I disagree. I think it depends on the language. Writing for example a LOGO compiler is much-much easier than writing an x86 assembler, and it's much easier than writing an fully featured C++ compiler. I don't think we can say one is harder than the other without defining the language and the architecture.
I've my own C-like compiler for my OS, and it took about 2-3 days to implement (by that I mean it was good enough to compile ANSI C source, but as the language evolves I keep writing). It was easier than writing an assembler, since it's basically nothing more than a source->assembly converter, I left all the hard parts to a multi-pass macroassembler. Writing a C compiler is definitely easy (it was created to be so), it uses less than 20 keywords (I'm not talking about the implementation of standard library which is quite a big effort, but the language itself).
I've never written x86 assembler, although I wrote a disassembler and I wrote compilers that were very similar to assemblers, and interpreters that can be considered as vm for special bytecodes. Difficulty always depended on the language.
I've my own C-like compiler for my OS, and it took about 2-3 days to implement (by that I mean it was good enough to compile ANSI C source, but as the language evolves I keep writing). It was easier than writing an assembler, since it's basically nothing more than a source->assembly converter, I left all the hard parts to a multi-pass macroassembler. Writing a C compiler is definitely easy (it was created to be so), it uses less than 20 keywords (I'm not talking about the implementation of standard library which is quite a big effort, but the language itself).
I've never written x86 assembler, although I wrote a disassembler and I wrote compilers that were very similar to assemblers, and interpreters that can be considered as vm for special bytecodes. Difficulty always depended on the language.
Re: Compiler vs. Assembler -- Which is harder?
Correction - writing a toy compiler that compiles a subset of C, doesn't pay attention to the (rather exquisite) language semantics and aliasing rules takes a few days (using a parser generator and a giant switch statement).Writing a C compiler is definitely easy (it was created to be so), it uses less than 20 keywords (I'm not talking about the implementation of standard library which is quite a big effort, but the language itself).
Writing an optimising C compiler that follows the language standard takes many man-years of work.
Re: Compiler vs. Assembler -- Which is harder?
With respect, you are wrong. No such thing "subset of C". The full definition is so small there's no point in defining subsets, also any subset would be insufficient.JamesM wrote:Correction - writing a toy compiler that compiles a subset of C, doesn't pay attention to the (rather exquisite) language semantics and aliasing rules takes a few days (using a parser generator and a giant switch statement).
Writing an optimising C compiler that follows the language standard takes many man-years of work.
What is C made of?
1. precompiler directives: #include, #define, #undef, #if, #else, #endif, #pragma, #line, #error
2. keywords: auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, int, long, register, return, short, signed, sizeof, static, struct, switch, typedef, union, unsigned, void, volatile, while.
That's all, and this list also includes variable types/modifiers as well as all control flow instructions.
3. control characters: { } ;
4. operators
5. constants
6. variables: labels for memory addresses just like in assembly
That's it. Nothing more. And don't forget that you "translate" this into assembly, and if your assembler is powerful enough you don't even have to care about calculating offset within structs, unions etc. And according to giant switches, amd manual has already written assembly templates for that.
Do not forget, that Dennis Ritchie designed it to be portable, and to achieve this it had to be minimalistic. A more complex language would be harder to port, agree? It's not an accident that C is the longest living, and most ported language.
I know that your programming skill goes further than many programmers' (including many OS developers in this forum, me too), but how many C compilers have you implemented so far? (I know you're working on a HLA, but it's a different kind of beast)
Re: Compiler vs. Assembler -- Which is harder?
This one. Obviously not from scratch - there is just under 20 years wall-clock effort (20*n man years) in that compiler, but that's what I get paid to do as a day job. I don't like to talk about it because I don't want to try and push my opinions with my career. But your question was kind of directed.but how many C compilers have you implemented so far?
These are merely tokens - parsing C is not difficult (it's not context free, but that's not so much of an issue).What is C made of?
1. precompiler directives: #include, #define, #undef, #if, #else, #endif, #pragma, #line, #error
2. keywords: auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, int, long, register, return, short, signed, sizeof, static, struct, switch, typedef, union, unsigned, void, volatile, while.
That's all, and this list also includes variable types/modifiers as well as all control flow instructions.
3. control characters: { } ;
4. operators
5. constants
6. variables: labels for memory addresses just like in assembly
Once you have an AST, that is where your problems start.
* Register allocation, stack spilling/filling, calling conventions.
* Passing structures by value.
* Unions.
* Proper handling of the volatile, const and restrict qualifiers ("volatile * const *x = 0")
* Bitfields.
* Padding and alignment - there are a multitude of rules about this.
* Ensuring stack alignment on platforms that require it.
* Datatypes that are larger than the native register width - long long int for example on 32-bit machines.
* Non-word aligned loads and stores on platforms that only allow word-aligned.
* Function-static variables; one-time initialization.
* Variadic functions.
* Type coercion, promotion (signed to unsigned and vice versa, automatic size extension)
I've not even scraped the surface. And C++ is a whole different beast all together, although you weren't talking about C++ so I'll stay quiet about that one.
This is before you even get into the realm of optimisation or concurrency. I have a copy of the C99 standard on my desk. It is as large as volumes 1 and 2a of the intel manuals combined.
With respect - you are wrong. Look at how much work has had to go into LLVM to make it C ready.
- JackScott
- Member
- Posts: 1033
- Joined: Thu Dec 21, 2006 3:03 am
- Location: Hobart, Australia
- Mastodon: https://aus.social/@jackscottau
- GitHub: https://github.com/JackScottAU
- Contact:
Re: Compiler vs. Assembler -- Which is harder?
Don't forget that if you're writing a C compiler, you also need to write or port a standard library for it as well. As Solar has demonstrated for us, that's a lot of work in itself, if you're going to do it properly (which you should be).
Re: Compiler vs. Assembler -- Which is harder?
Tell Matthew Dillon it`s impossible that he did.
Re: Compiler vs. Assembler -- Which is harder?
I didn't, but you should learn how to read. I quote myself:JackScott wrote:Don't forget that if you're writing a C compiler, you also need to write or port a standard library for it as well. As Solar has demonstrated for us, that's a lot of work in itself, if you're going to do it properly (which you should be).
"I'm not talking about the implementation of standard library which is quite a big effort"
- JackScott
- Member
- Posts: 1033
- Joined: Thu Dec 21, 2006 3:03 am
- Location: Hobart, Australia
- Mastodon: https://aus.social/@jackscottau
- GitHub: https://github.com/JackScottAU
- Contact:
Re: Compiler vs. Assembler -- Which is harder?
My mistake, I missed that part of your post.
Re: Compiler vs. Assembler -- Which is harder?
The full definition - in form of document ISO/IEC 9899:1999 (E) - has 162 pages of normative text in chapter 1-6...turdus wrote:No such thing "subset of C". The full definition is so small there's no point in defining subsets, also any subset would be insufficient.
That is excluding the library (chapter 7) and the normative Annexes.
Since you missed #ifdef, #ifndef, inline, restrict, _Bool, _Complex and _Imaginary, I assume that your statement is based in C89.1. precompiler directives: #include, #define, #undef, #if, #else, #endif, #pragma, #line, #error
2. keywords: auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, int, long, register, return, short, signed, sizeof, static, struct, switch, typedef, union, unsigned, void, volatile, while.
But the current standard is C99, and by the time you are nearing completion of your C compiler, C1X will be a reality.
Let's have a look at the current draft, should we?
- Specifiers: _Alignas, _Noreturn, _Generic, _Thread_local, _Atomic
- Operators: alignof
- char16_t, char32_t, u, U and u8 string literal prefixes (i.e., full Unicode support)
- Bounds-checking interfaces (Annex K)
- Analyzability features (Annex L)
- Anonymous structures and unions
- Static assertions
Microsoft, at least, openly admitted that they don't plan on supporting C99 ten years after the fact.
What Ritchie designed and what it is today is several decades, two major standardizations and several minor updates apart. You remember K&R declarations? They aren't even legal anymore.Do not forget, that Dennis Ritchie designed it to be portable, and to achieve this it had to be minimalistic.
Yea, and most ports aren't complete, because starting a C compiler (or any project, actually) is easy. Finishing one to the satisfaction of professionals so that it will be called "compliant" is the tricky part. I've seen many a C lib that was begun, and swamped as implementors realized it wasn't that easy. And getting that compiler to comply is even harder.It's not an accident that C is the longest living, and most ported language.
And I haven't talked about optimizations. GCC's -O1 includes 30 different optimizations, -O2 has 33 more, -O3 another 6, for a grand total of 69 options. Some of these are pretty involved, as you can read in the relevant section of the manual. And with all that work, GCC generated code is not even near the output of ICC or VACPP.
You may shrug it off, but that kind of optimization matters in the business.
Again, patching together a somewhat-C-ish compiler with Flex and Bison is easy. Not as easy as writing a 1:1 assembler-to-machine-code translator, but easy. But it's only the first step on a long road.
Every good solution is obvious once you've found it.