ASM to ASM "compiler"

Programming, for all ages and all languages.
Post Reply
IchMagBier
Posts: 3
Joined: Sun Dec 16, 2018 6:18 am

ASM to ASM "compiler"

Post by IchMagBier »

Hello :)

I have created this little "compiler", which "compiles" some sort of high level assembler code to AT&T syntax. I thought it might be interesting for you OS-developers. I wrote it in one single day, so expect bugs and not too many features. There is no error-detection, so when there are problems with the code, the compiler either crashes or passes the wrong code to gnu's as.
The syntax supports functions, loops, structs, variables, something like if-statements and expressions like "eax = 2 + ebx - (variable)".

I wrote a small Hello World "kernel" to show of the syntax:

Code: Select all

#16bit							     ; generate 16bit code 

func print(si) (			       ; function "print" with one parameter in register "si", functions are automatically appended after the code
	do (						       ; loop
		lodsb					      ; al = character in si
		ah = 0x0E	
		bh = 0
		int 0x10				      ; BIOS-interrupt to print character
	) jmp al != 0			       ; loop until al = 0 (= end of string)
)

global byte empty(470) = 0		; fill up the 512bytes, globals are automatically appended after the code
global word signatur = 0xAA55	; boot signature

ax = 0x07C0 -> ds 				  ; set ds
if = 0							     ; disable interrupts
df = 0							     ; direction flag = 0 (for printing strings)
call print(@"Hello World\n")    ; call function to print a string, the string is appended together with globals

do (							       ; infinite loop
)
This code is included in the download folder in "examples/hello_world_boot.asm", together with a Hello World for Linux and some other examples in the folder "explanations".
The "compiler" can generate either flat, bootable binaries or Linux ELF executables (with the "-h" option).

Compiler-options:

Code: Select all

Usage: asm [OPTION...] FILE

 --help    display this help and exit
 -a        do not assemble or link (preserves 'tmp_.s' file)
 -l        do not link (preserves 'tmp_.o' file)
 -h        adds linux related code to create a working ELF executable
I don't want this to be a big project and I wanted to have something rather quick, so the code of the compiler is REALLY ugly. It is included in the download in the folder "src".
Attachments
basm.zip
(54.18 KiB) Downloaded 124 times
Last edited by IchMagBier on Mon Dec 17, 2018 2:28 am, edited 2 times in total.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: ASM to ASM "compiler"

Post by Schol-R-LEA »

You don't really expect us to run a program binary from some random stranger, do you?

Now, assuming you are on the up and up... which is usually a poor assumption to make, but I have no intention of even touching that executable file, and I would hate to assume bad faith from a fellow OS-Devver...

The idea of a 'structured assembler' is a pretty old one (going back to the 1950s - most of the early attempts at high level languages, such as Atlas Autocode, were basically in this vein), and some of them are really interesting. Unfortunately, most of them don't really catch on, for a variety of reasons.

Which is not a reason to stop, as a new option for better low-level programming is always welcome. This one looks pretty good.

(For my own part, my intention is to go with a similar idea, that of a High Level Macro Assembler; the idea is that the macros would be the basis for high-level code which would be transformed inline to assembly, similar to the classic SNOBOL4 interpreter of the 1970s. My current design uses Scheme as both the implementation language and the macro system - the code is Scheme code, which is 'assembled' by running the 'assembly language' code, which is actually a set of HOFs which transform the code into a new program which emits the machine code. My current experiment targets ARM, but it should be suitable for a retargetable assembler similar to GAS.)


The following is a 'boilerplate' series of links given to most new OS-Dev members.

The first thing I want to say is this: if you aren't already using version control for all software projects you are working on, drop everything and start to do that now. Set up a VCS such as Git, Subversion, Mercurial, Bazaar, or what have you - which you use is less important than the fact that you need to use it. Similarly, setting up your repos on an offsite host such as Gitlab, Github, Sourceforge, CloudForge, or BitBucket should be the very first thing you do whenever you start a new project, no matter how large or small it is.

If nothing else, it makes it easy to share your code with us on the forum, as you can just post a link, rather than pasting oodles and oodles of code into a post.

Once you have that out of the way (if you didn't already), you can start to consider the OS specific issues.
If you haven't already, I would strongly advise you to read the introductory material in the wiki:
After this, go through the material on the practical aspects of
running an OS-dev project: I strongly suggest that you read through these pages in detail, along with the appropriate ones to follow, before doing any actual development. These pages should ensure that you have at least the basic groundwork for learning about OS dev covered.

This brings you to your first big decision: which platform, or platforms, to target. Commonly options include:
  • x86 - the CPU architecture of the stock PC desktops and laptops, and the system which remains the 'default' for OS dev on this group. However, it is notoriously quirky, especially regarding Memory Segmentation, and the sharp divisions between 16-bit Real Mode, 16-bit and 32-bit Protected Modes, and 64-bit Long Mode.
  • ARM - a RISC architecture widely used on mobile devices and for 'Internet of Things' and 'Maker' equipment, including the popular Raspberry Pi and Beagleboard single board computers. While it is generally seen as easier to work with that x86, most notably in the much less severe differences in between the 32-bit and 64-bit modes and the lack of memory segmentation, the wiki and other resources don't cover it nearly as well (though this is changing over time as it becomes more commonly targeted).
  • MIPS, another RISC design which is slightly older than ARM. It is one of the first RISC design to come out, being part of the reason the idea caught on, and is even simpler than ARM in terms of programming, though a bit tedious when it comes to assembly programming. While it was widely used in workstations and game consoles in the 1990s, it has declined significantly due to mismanagement by the owners of the design, and is mostly seen in devices such as routers. There are a handful of System on Chip single-board computers that use it, such as the Creator Board and the Onion Omega2, and manufacturers in both China and Russia have licensed the ISA with the idea of breaking their dependence on Intel. Finding good information on the instruction set is easy, as it is widely used in courses on assembly language and computer architecture and there are several emulators that run MIPS code, but finding usable information on the actual hardware systems using it is often difficult at best.
  • RISC-V is an up and coming open source hardware ISA, but so far is Not Ready For Prime Time. This may change in the next few years, though.
You then need to decide which language to use for the kernel. I gather you are using C, which is the usual recommendation. However, you then need to choose the compiler, assembler, linker, build tool, and support utilities to use - what is called the 'toolchain' for your OS.

For most platforms, there aren't many to choose from, and the obvious choice would be GCC and the Binutils toolchain due to their ubiquity. However, on the Intel x86 platform, it isn't as simple, as there are several other toolchains which are in widespread use for it, the most notable being the Microsoft one - a very familiar one to Windows programmers, but one which presents problems in OSDev. The biggest issue with Visual Studio, and with proprietary toolchains in general, is that using it rules out the possibility of your OS being "self-hosting" - that is to say, being able to develop your OS in the OS itself, something most OSdevs do want to eventually be able to do. The fact that Porting GCC to your OS is feasible, whereas porting proprietary x86 toolchains isn't, is a big factor in the use Binutils and GCC, as it their deep connection to Linux and other Unix derivatives.

Regardless of the high-level language you use for OS dev (if any), you will still need to use assembly language, which means choosing an assembler. If you are using Binutils and GCC, the obvious choice would be GAS, but for x86 especially, there are other assemblers which many OSdevs prefer, such as Netwide Assembler (NASM) and Flat Assembler (FASM).

The important thing here is that assembly language syntax varies more among the x86 assemblers than it does for most other platforms, with the biggest difference being that between the Intel syntax used in the majority of x86 assemblers, and the AT&T syntax used in GAS. You can see an overview of the differences on the somewhat misnamed wiki page Opcode syntax.

It is still important to understand that the various Intel syntax assemblers - NASM, FASM, and YASM among others - have differences in how they handle indexing, in the directives they use, and in their support for features such as macros and defining data structures. While most of these follow the general syntax of Microsoft Assembler (MASM), they all diverge from it in various ways.

Once you know which platform you are targeting, and the toolchain you want to use, you need to understand them. You should read up on the core technologies for the platform. Assuming that you are targeting the PC architecture, this would include: This leads to the next big decision: which Bootloader to use. There are a number of different standard bootloaders for x86, with the most prominent being GRUB. We strong recommend against Rolling Your Own Bootloader, but it is an option as well.

You need to consider what kind of File System to use. Common ones used when starting out in OS dev include: We generally don't recommend designing your own, but as with boot loaders, it is a possibility as well.

While this is a lot of reading, it simply reflects the due diligence that any OS-devver needs to go through in order to get anywhere. OS development, even as a simple project, is not amenable to the Stack Overflow cut-and-paste model of software development; you really need to understand a fair amount of the concepts and principles before writing any code, and the examples given in tutorials and forum posts generally are exactly that. Copying an existing code snippet without at least a basic idea of what it is doing simply won't do. While learning itself is an iterative process - you learn one thing, try it out, see what worked and what didn't, read some more, etc. - in this case a basic foundation is needed at the start. Without a solid understanding of at least some of the core ideas before starting, you simply can't get very far in OS dev.

Hopefully, this won't scare you off; it isn't nearly as bad as it sounds. It just takes a lot of patience and a bit of effort, a little at a time.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
IchMagBier
Posts: 3
Joined: Sun Dec 16, 2018 6:18 am

Re: ASM to ASM "compiler"

Post by IchMagBier »

You don't really expect us to run a program binary from some random stranger, do you?
I guess you are right. I have included the source in the new upload. 1377 lines of pure horror. :D
It is FreeBasic code.

I have added some more features in the new upload ("basm.zip" in the first post):
- 16bit segmented addressing (see the file "explanations/addressing_modes.asm")
- Creating variables at a specific memory address (it's more like an alias):
-> global byte vgamemory @0xB8000
- String constants:
-> rax = "Hello\n"
- Structs (see "explanations/structs.asm"):

Code: Select all

type teststruct (
	byte byte1			    ; Will be compiled to a ".set" directive. That means this name can't be used twice!
	byte array1(10)		
	qword pointer
)

global teststruct test 	; Structs are always saved in the bss section!

rax = (test+pointer)	   ; Accessing an element of the struct
The idea of a 'structured assembler' is a pretty old one
I only know "HLA", which looks neat, but I wanted to create something on my own. I didn't want something overly userful and I know it will never be. I just like assembler code.
the idea is that the macros would be the basis for high-level code which would be transformed inline to assembly
Isn't that the idea of FASM? I have never used it, but I know that it has a powerful macro-syntax.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: ASM to ASM "compiler"

Post by Solar »

FWIW, the term for such a compiler -- taking in some source and producing some different source instead of binary -- is "transcompiler", or "transpiler".
Every good solution is obvious once you've found it.
Post Reply