Page 1 of 1
Machine language questions
Posted: Tue Jul 23, 2013 1:49 pm
by OldGuy63
As I understand it, there is something called machine language, which exists at a level below assembly and is wholly binary. To that, I have two questions.
1. Does machine language exist?
2. Are there any examples of OSs that were/are written entirely in machine language?
Re: Machine language questions
Posted: Tue Jul 23, 2013 2:15 pm
by Mikemk
Re: Machine language questions
Posted: Tue Jul 23, 2013 2:33 pm
by iansjack
1. Yes, machine code does exist. It is a sequence of hexadecimal numbers.
2. No-one actually programs in machine code. Assembly language is essentially just a way of representing machine code as easier to remember and read mnemonics.
In the early days of computers programmers did work directly with machine code but there would be no reason to do so nowadays. You only really need to know it if you are writing an assembler to convert assembly language to machine code.
Re: Machine language questions
Posted: Tue Jul 23, 2013 3:57 pm
by Casm
Machine code is that which is directly executed by the processor. I only know of one person who writes programs in machine code. Any sane person who needs to program at that low level uses assembly language, which is basically a human readable version of machine code.
All computer programs, including operating syatems, are eventually compiled into machine code, so that the processor can execute them, but no operating system has ever been written in machine code. The only exception to that last remark might have been the very first computers, but the programs which ran on them could hardly be graced with the name "operating system", because they only had a few hundred bytes of memory.
Re: Machine language questions
Posted: Tue Jul 23, 2013 4:03 pm
by dozniak
All programmers of the past were able to enter machine language using 8 switches and 1 button. The art is nearly lost, but there's at least one person on this forum still able to do so for x86.
Re: Machine language questions
Posted: Tue Jul 23, 2013 6:21 pm
by DavidCooper
(1) Machine language on most computers takes the form of instructions which are held in one or more bytes and which may be followed by further bytes which will be invoved in the operations triggered by those instructions (e.g. on the PC, 04 05 is an instruction to add the value 5 to whatever value is in the CPU register AL, so it's only the 04 there that's an instruction byte). These numbers held in bytes are not specifically hexadecimal, but can be typed in as binary, hexadecimal, decimal, base 256, base one, or any other base you care to use to type them in. It's sometimes best to think of it as binary since that's how the bytes are physically stored, but from the instruction point of view it may be more accurate to think of them as base 256 (on the PC at least) due to the arbitrary manner in which the numbers are assigned to the functionality, though it's a lot more complex than that, as can be seen most clearly if you look at the way instructions are constructed on ARM and Itanium processors. In all these cases, there will be a part of the byte or group of bytes which is checked first which will determine how the rest of the bits are to be interpreted by the processor, and there will be often be a series of further checks which follow where other bits are tested next to narrow down further how the rest of the bits are to be interpreted. None of these systems are perfectly neat and the assignments of bits are highly arbitrary by design, although wherever possible, groups of bits will have functionality which neatly ties in with the values held by them. This can be seen in the oorrrmmm bytes on the PC which often occur as the second byte of an instruction, the rrr and mmm bits holding values from 0 to 7 which map them neatly to different CPU registers.
(2) A PC OS written in machine code can be found at the link in my sig., though it's fairly primitive. You can run it in Bochs (or on a real machine if you're brave). [A much more advanced version exists (booting from flash drive and with a proper GUI) but it isn't ready for release just yet because it may fry a monitor if someone was to press the wrong key at a bad time, and it also needs some work done on it to make it compatible with an emulator again - it half works on QEMU and half works on Bochs, but it is currently useful on neither, but that will be fixed when time allows.]
Re: Machine language questions
Posted: Tue Jul 23, 2013 10:24 pm
by NickJohnson
Understanding machine code can be useful in some circumstances. In many cases, code injected into a buffer overflow must not contain zero bytes; if you know some tricks involving machine code representations, you can get rid of them. Similarly, many pieces of malware use code obfuscation techniques that involve injecting garbage into machine code (or, in the case of the x86, sometimes jumping into the middle of instructions).
To prove my point about obfuscation, take a look at this perfectly valid cdecl function, and see if you can figure out what parameter will make it return 0:
Code: Select all
mov edx, [esp+0x4]
call dword 0x9
pop eax
add eax, 0xc180c031
add eax, 0xe18089b0
add eax, 0xf180b5b4
add eax, 0xb110e0c1
add eax, 0xe98069b0
add eax, 0xe1c0bfb4
add eax, 0xb1c2af0f
add eax, 0xc329e883
add eax, 0x7a3f5eb6
jmp eax
Re: Machine language questions
Posted: Wed Jul 24, 2013 12:39 pm
by Mikemk
Often, to prevent reverse engineering, companies will add one or two random bytes to the beginning of functions, and adjust the references to call past those. This causes disassemblies to be wrong.