Compiler vs. Assembler -- Which is harder?

Programming, for all ages and all languages.
Post Reply
User avatar
casnix
Member
Member
Posts: 67
Joined: Fri Jan 14, 2011 7:24 pm

Compiler vs. Assembler -- Which is harder?

Post by casnix »

I don't know because I don't write compilers and assemblers, but:

Which do you think is harder to write? An assembler or compiler?
You are a computer.
~ MCS ~
User avatar
Chandra
Member
Member
Posts: 487
Joined: Sat Jul 17, 2010 12:45 am

Re: Compiler vs. Assembler -- Which is harder?

Post by Chandra »

casnix wrote:I don't know because I don't write compilers and assemblers, but:

Which do you think is harder to write? An assembler or compiler?
Basically, to write an assembler, you need to know what machine instruction stand for what assembly instuction. This too depend upon the machine architecture. For instance, to implement something specific to x86, you've to peek through the Intel's manual and figure how instructions are encoded. Moreover, you've to arrange things in order, the way it was supposed to be. This seems tedious.

On the other hand, if you've ever used gcc -S option, you'd have noticed how compiler arranges things to make programmers life easy.

Conclusion: I'd give 50 points to both. But the fact that compilers don't exist without assemblers give me a strong feeling that writing compiler is somehow troublesome (because it involves writing assembler as well).

Cheers.

Edit: I recalled one of the assembly book which wrote this,
" It took several decades for computer scientists to figure out how to even write a compiler!".
Last edited by Chandra on Sun Mar 27, 2011 10:21 am, edited 1 time in total.
Programming is not about using a language to solve a problem, it's about using logic to find a solution !
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

Re: Compiler vs. Assembler -- Which is harder?

Post by NickJohnson »

A compiler is definitely harder to write than an assembler. An assembler, in essence, is just a table that maps short strings (instruction mnemonics) to numbers (opcodes); the hardest part of writing an assembler is just creating that table efficiently. A compiler, on the other hand, needs a much more complex parser (depending on the language), and needs to manage stack positions and register usage in arbitrary circumstances. The fact that assembly preceded (widespread) high level languages by many years is due to this.
TylerH
Member
Member
Posts: 285
Joined: Tue Apr 13, 2010 8:00 pm
Contact:

Re: Compiler vs. Assembler -- Which is harder?

Post by TylerH »

I agree with Nick's explaination of the two, but I have something to add to the compiler part. A good compiler uses advanced techniques to optimize code and eliminate redundancies. http://en.wikipedia.org/wiki/Compiler#C ... nstruction read the "Front End" and "Back End" sections.
User avatar
Thomas
Member
Member
Posts: 284
Joined: Thu Jun 04, 2009 11:12 pm

Re: Compiler vs. Assembler -- Which is harder?

Post by Thomas »

Hi,
However the good part of being born recently is that most part of compiler writing is formalized to a great degree that it really is not as hard as it used to be :wink: . If you can follow the algorithm, you can more or less make it. I agree with Nick, writing an assembler is way easier than writing a compiler. Remember that an assembler is only little more than a simple look up table in some cases.

--Thomas
User avatar
turdus
Member
Member
Posts: 496
Joined: Tue Feb 08, 2011 1:58 pm

Re: Compiler vs. Assembler -- Which is harder?

Post by turdus »

I disagree. I think it depends on the language. Writing for example a LOGO compiler is much-much easier than writing an x86 assembler, and it's much easier than writing an fully featured C++ compiler. I don't think we can say one is harder than the other without defining the language and the architecture.

I've my own C-like compiler for my OS, and it took about 2-3 days to implement (by that I mean it was good enough to compile ANSI C source, but as the language evolves I keep writing). It was easier than writing an assembler, since it's basically nothing more than a source->assembly converter, I left all the hard parts to a multi-pass macroassembler. Writing a C compiler is definitely easy (it was created to be so), it uses less than 20 keywords (I'm not talking about the implementation of standard library which is quite a big effort, but the language itself).

I've never written x86 assembler, although I wrote a disassembler and I wrote compilers that were very similar to assemblers, and interpreters that can be considered as vm for special bytecodes. Difficulty always depended on the language.
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: Compiler vs. Assembler -- Which is harder?

Post by JamesM »

Writing a C compiler is definitely easy (it was created to be so), it uses less than 20 keywords (I'm not talking about the implementation of standard library which is quite a big effort, but the language itself).
Correction - writing a toy compiler that compiles a subset of C, doesn't pay attention to the (rather exquisite) language semantics and aliasing rules takes a few days (using a parser generator and a giant switch statement).

Writing an optimising C compiler that follows the language standard takes many man-years of work.
User avatar
turdus
Member
Member
Posts: 496
Joined: Tue Feb 08, 2011 1:58 pm

Re: Compiler vs. Assembler -- Which is harder?

Post by turdus »

JamesM wrote:Correction - writing a toy compiler that compiles a subset of C, doesn't pay attention to the (rather exquisite) language semantics and aliasing rules takes a few days (using a parser generator and a giant switch statement).

Writing an optimising C compiler that follows the language standard takes many man-years of work.
With respect, you are wrong. No such thing "subset of C". The full definition is so small there's no point in defining subsets, also any subset would be insufficient.

What is C made of?
1. precompiler directives: #include, #define, #undef, #if, #else, #endif, #pragma, #line, #error
2. keywords: auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, int, long, register, return, short, signed, sizeof, static, struct, switch, typedef, union, unsigned, void, volatile, while.
That's all, and this list also includes variable types/modifiers as well as all control flow instructions.
3. control characters: { } ;
4. operators
5. constants
6. variables: labels for memory addresses just like in assembly

That's it. Nothing more. And don't forget that you "translate" this into assembly, and if your assembler is powerful enough you don't even have to care about calculating offset within structs, unions etc. And according to giant switches, amd manual has already written assembly templates for that.

Do not forget, that Dennis Ritchie designed it to be portable, and to achieve this it had to be minimalistic. A more complex language would be harder to port, agree? It's not an accident that C is the longest living, and most ported language.

I know that your programming skill goes further than many programmers' (including many OS developers in this forum, me too), but how many C compilers have you implemented so far? (I know you're working on a HLA, but it's a different kind of beast)
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: Compiler vs. Assembler -- Which is harder?

Post by JamesM »

but how many C compilers have you implemented so far?
This one. Obviously not from scratch - there is just under 20 years wall-clock effort (20*n man years) in that compiler, but that's what I get paid to do as a day job. I don't like to talk about it because I don't want to try and push my opinions with my career. But your question was kind of directed.
What is C made of?
1. precompiler directives: #include, #define, #undef, #if, #else, #endif, #pragma, #line, #error
2. keywords: auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, int, long, register, return, short, signed, sizeof, static, struct, switch, typedef, union, unsigned, void, volatile, while.
That's all, and this list also includes variable types/modifiers as well as all control flow instructions.
3. control characters: { } ;
4. operators
5. constants
6. variables: labels for memory addresses just like in assembly
These are merely tokens - parsing C is not difficult (it's not context free, but that's not so much of an issue).

Once you have an AST, that is where your problems start.

* Register allocation, stack spilling/filling, calling conventions.
* Passing structures by value.
* Unions.
* Proper handling of the volatile, const and restrict qualifiers ("volatile * const *x = 0")
* Bitfields.
* Padding and alignment - there are a multitude of rules about this.
* Ensuring stack alignment on platforms that require it.
* Datatypes that are larger than the native register width - long long int for example on 32-bit machines.
* Non-word aligned loads and stores on platforms that only allow word-aligned.
* Function-static variables; one-time initialization.
* Variadic functions.
* Type coercion, promotion (signed to unsigned and vice versa, automatic size extension)

I've not even scraped the surface. And C++ is a whole different beast all together, although you weren't talking about C++ so I'll stay quiet about that one.

This is before you even get into the realm of optimisation or concurrency. I have a copy of the C99 standard on my desk. It is as large as volumes 1 and 2a of the intel manuals combined.

With respect - you are wrong. Look at how much work has had to go into LLVM to make it C ready.
User avatar
JackScott
Member
Member
Posts: 1033
Joined: Thu Dec 21, 2006 3:03 am
Location: Hobart, Australia
Mastodon: https://aus.social/@jackscottau
GitHub: https://github.com/JackScottAU
Contact:

Re: Compiler vs. Assembler -- Which is harder?

Post by JackScott »

Don't forget that if you're writing a C compiler, you also need to write or port a standard library for it as well. As Solar has demonstrated for us, that's a lot of work in itself, if you're going to do it properly (which you should be).
User avatar
turdus
Member
Member
Posts: 496
Joined: Tue Feb 08, 2011 1:58 pm

Re: Compiler vs. Assembler -- Which is harder?

Post by turdus »

Tell Matthew Dillon it`s impossible that he did.
User avatar
turdus
Member
Member
Posts: 496
Joined: Tue Feb 08, 2011 1:58 pm

Re: Compiler vs. Assembler -- Which is harder?

Post by turdus »

JackScott wrote:Don't forget that if you're writing a C compiler, you also need to write or port a standard library for it as well. As Solar has demonstrated for us, that's a lot of work in itself, if you're going to do it properly (which you should be).
I didn't, but you should learn how to read. I quote myself:
"I'm not talking about the implementation of standard library which is quite a big effort"
User avatar
JackScott
Member
Member
Posts: 1033
Joined: Thu Dec 21, 2006 3:03 am
Location: Hobart, Australia
Mastodon: https://aus.social/@jackscottau
GitHub: https://github.com/JackScottAU
Contact:

Re: Compiler vs. Assembler -- Which is harder?

Post by JackScott »

My mistake, I missed that part of your post.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Compiler vs. Assembler -- Which is harder?

Post by Solar »

turdus wrote:No such thing "subset of C". The full definition is so small there's no point in defining subsets, also any subset would be insufficient.
The full definition - in form of document ISO/IEC 9899:1999 (E) - has 162 pages of normative text in chapter 1-6...

That is excluding the library (chapter 7) and the normative Annexes.
1. precompiler directives: #include, #define, #undef, #if, #else, #endif, #pragma, #line, #error

2. keywords: auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, int, long, register, return, short, signed, sizeof, static, struct, switch, typedef, union, unsigned, void, volatile, while.
Since you missed #ifdef, #ifndef, inline, restrict, _Bool, _Complex and _Imaginary, I assume that your statement is based in C89.

But the current standard is C99, and by the time you are nearing completion of your C compiler, C1X will be a reality.

Let's have a look at the current draft, should we?
  • Specifiers: _Alignas, _Noreturn, _Generic, _Thread_local, _Atomic
  • Operators: alignof
  • char16_t, char32_t, u, U and u8 string literal prefixes (i.e., full Unicode support)
  • Bounds-checking interfaces (Annex K)
  • Analyzability features (Annex L)
  • Anonymous structures and unions
  • Static assertions
You know how many fully compliant C99 compilers are out there? MSVC isn't, and GCC isn't either. Intel CC isn't, Watcom isn't, IBM VACPP isn't. All of these would love to just slap that "C99 compliant" sticker on their advertising, but they also realized that getting there requires serious amounts of work.

Microsoft, at least, openly admitted that they don't plan on supporting C99 ten years after the fact.
Do not forget, that Dennis Ritchie designed it to be portable, and to achieve this it had to be minimalistic.
What Ritchie designed and what it is today is several decades, two major standardizations and several minor updates apart. You remember K&R declarations? They aren't even legal anymore.
It's not an accident that C is the longest living, and most ported language.
Yea, and most ports aren't complete, because starting a C compiler (or any project, actually) is easy. Finishing one to the satisfaction of professionals so that it will be called "compliant" is the tricky part. I've seen many a C lib that was begun, and swamped as implementors realized it wasn't that easy. And getting that compiler to comply is even harder.

And I haven't talked about optimizations. GCC's -O1 includes 30 different optimizations, -O2 has 33 more, -O3 another 6, for a grand total of 69 options. Some of these are pretty involved, as you can read in the relevant section of the manual. And with all that work, GCC generated code is not even near the output of ICC or VACPP.

You may shrug it off, but that kind of optimization matters in the business.

Again, patching together a somewhat-C-ish compiler with Flex and Bison is easy. Not as easy as writing a 1:1 assembler-to-machine-code translator, but easy. But it's only the first step on a long road.
Every good solution is obvious once you've found it.
Post Reply