16 Bit Programming Language

Programming, for all ages and all languages.
Post Reply
Cjreek
Member
Member
Posts: 70
Joined: Thu May 28, 2009 2:41 pm
Location: Germany

16 Bit Programming Language

Post by Cjreek »

Hi,

I'm looking for a programming language (except assembler) which is able to create 16 bit flat binary code.
Is there any language like this?

I asked Google, but unfortunately I didn't find anything satisfying.
User avatar
KotuxGuy
Member
Member
Posts: 96
Joined: Wed Nov 25, 2009 1:28 pm
Location: Somewhere within 10ft of my favorite chubby penguin!

Re: 16 Bit Programming Language

Post by KotuxGuy »

AFAIK, there are have been no 16-bit C/C++ compilers(or any other compiler, for that matter) for a long while.
Your only choice, AFAIK, is to use assembly.
Give a man Linux, you feed the nearest optician ( Been staring at the PC too long again? ).
Give a man OS X, you feed the nearest NVidia outlet ( I need more GPU power!! )
Give a man Windows, you feed the entire Tylenol company ( Self explanatory :D )
User avatar
gravaera
Member
Member
Posts: 737
Joined: Tue Jun 02, 2009 4:35 pm
Location: Supporting the cause: Use \tabs to indent code. NOT \x20 spaces.

Re: 16 Bit Programming Language

Post by gravaera »

Hello;

Programming languages themselves aren't usually a particular wordsize...

Look around google for '.code16gcc'.

--gravaera
17:56 < sortie> Paging is called paging because you need to draw it on pages in your notebook to succeed at it.
RedEagle
Member
Member
Posts: 31
Joined: Sat Nov 04, 2006 5:38 am
Location: Earth
Contact:

Re: 16 Bit Programming Language

Post by RedEagle »

I use bcc and ld86 for my bootloader

Code: Select all

bcc -ansi -0 -c -o test.o test.c
ld86 -d -T 0x3000 -o test.bin test.o
the 0x3000 is the segment-address
you get these tools by installing the dev86 package

If you just work in one segment everything's fine. If you want to start your 16Bit program from a different segment you need some lines assembly:

Code: Select all

[Bits 16]
[extern _cmain]
 
_main:
    call _cmain
    retf 
mfg.: RedEagle
Cjreek
Member
Member
Posts: 70
Joined: Thu May 28, 2009 2:41 pm
Location: Germany

Re: 16 Bit Programming Language

Post by Cjreek »

Hi RedEagle,

I should have mentioned that I'm using Windows...
Do you know whether there's a ld86 (dev86) version for windows?
RedEagle
Member
Member
Posts: 31
Joined: Sat Nov 04, 2006 5:38 am
Location: Earth
Contact:

Re: 16 Bit Programming Language

Post by RedEagle »

Cjreek wrote:[..]
Do you know whether there's a ld86 (dev86) version for windows?
No :(

But I know an other 16 Bit C-Compiler for windows, well DOS :)
Try Borland's TurboC: Article and download-link @ edn.embarcadero.com
But I can't say if it is able to compiles raw binaries
mfg.: RedEagle
Cjreek
Member
Member
Posts: 70
Joined: Thu May 28, 2009 2:41 pm
Location: Germany

Re: 16 Bit Programming Language

Post by Cjreek »

Hi,

I downloaded Turbo C and tried this:

test.c:

Code: Select all

int main()
{

}
Compile (do not link):
tcc -c test.c
And now I want to link test.obj with tlink.exe. tlink.exe is able to create .com files. I think (and wikipedia is confirming this I think) .com files are raw binaries which I could use (Is this right?).

My try:

/n = no default libraries
/t = create .com file
tlink TEST.OBJ /n /t
But tlink.exe tells me:
Cannot generate COM file : invalid initial entry point address
How can I create .com Files?

But I think .com files could also be the wrong thing. They have a fixed entry point at 0x100, haven't they?
This isn't supposed to work in real mode right?
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: 16 Bit Programming Language

Post by ~ »

Cjreek wrote:But I think .com files could also be the wrong thing. They have a fixed entry point at 0x100, haven't they?
This isn't supposed to work in real mode right?

It is so, but think about it: assume a COM program that has 256 bytes of code and data that you wrote.

Assume it was loaded with CS=3000.

.COM files are linked at 100h for being able to create the program PSP (Program Segment Prefix) which holds environment variables and the command line passed to the program, among other things which is 100h (256) bytes in size. It is created automatically by DOS just before the program starts running, so if you work outside DOS you would always need to reserve space for an unused, dummy PSP.

So, you program would start at 3000:0100 for CS:IP.

You could thus copy manually your program at the start of a segment, just like above, leave 256 bytes unused at the start and start copying the 256 bytes that actually belong to your program, to the segment.

Now the only thing you need to be careful is NOT to use DOS INT 21h calls or the like, and you should be able to use the program from a bootloader. At least it would work for a .COM program made from an .ASM file. It might not work well or easily for a program generated with the Turbo C compiler, but you will need to disassemble and actually run the program to see if it's feasible (start with a rather dummy program to test whether it is stable from a bootloader).

Remember that you cannot fully escape learning assembly; otherwise you won't be able to understand the hardware, neither debugging critical low-level components, and that will prevent you from truly reaching your goal of writing a true full OS programming.

Note that .COM files can be linked anywhere, and can be 16, 32 or 64-bit, they are just raw binary code. It is DOS compilers that might force linking at 0x100 (maybe the creators of the compiler never thought about using it to write independent OS code but only to compile programs that would run under DOS and all of the above would be a workaround).
Last edited by ~ on Sat Feb 27, 2010 8:28 am, edited 1 time in total.
Cjreek
Member
Member
Posts: 70
Joined: Thu May 28, 2009 2:41 pm
Location: Germany

Re: 16 Bit Programming Language

Post by Cjreek »

~ wrote:...
Thanks, I'll try this once tlink.exe is going to create a .com file. Maybe there's any compiler switch to change the entry point in the c code?
~ wrote:Remember that you cannot fully escape learning assembly; otherwise you won't be able to understand the hardware and that will prevent you from truly reaching your goal of writing a true full OS programming.
It's not because I cannot write assembly. I can do this much better than I'm at programming with C :mrgreen:

Since now I nearly did everything with NASM. At the moment I'm writing a FAT12 driver in NASM. It works fine, but despite of the comments the code is pretty unreadable. So I thought about doing it with a language that offers a more readable syntax.
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: 16 Bit Programming Language

Post by ~ »

Cjreek wrote:Hi,

I'm looking for a programming language (except assembler) which is able to create 16 bit flat binary code.
Is there any language like this?

I asked Google, but unfortunately I didn't find anything satisfying.

Since you asked about a programming language (not its compiler) I have been designing a low-level-oriented language called RealC, which is designed for just that, actually it's a language optimized to program for the x86 16, 32 or 64-bit processors. It can use addressing for both 16 and 32-bit offsets if you use 16-bit code, and is originally thought to be used for Unreal Mode 16-bit code but it can easily and directly create standard Real Mode code.

It isn't specifically designed to tweak or use segment registers like DS, ES, FS, etc., but you can assign values to them in the same way you would do in assembly, like $DS=$AX;

It's a mixture of the easiest features from assembly language and C, and it can use NASM-syntax assembly language. It mostly uses registers as variables, and db, dw, dd, dq variables, and using it rapidly makes assembly language much easier to debug, optimize and understand. That's a plus, but you would need to translate manually the code to assembly, but at least you will notice that it will make your low level programming more effective.

It's like a C-like standarized pseudocode with the potential to be compiled into .ASM files for you to use. It has the ability to "compile" single source files with undeclared symbols to allow you to update only pieces of code instead of a whole source project (but be careful with functions that define parameters -although they aren't really implemented yet-).

The good thing about it is that it's also designed to not add absolutely any instruction you don't tell it (only raw strings of bytes exist, like in assembly), and it means that by now you can only use basic expressions, asm-style, but you will see that most of the time it actually makes the code more readable than making it solve a highly nested statement.

It's also very intuitive. If you know basic assembly (general-purpose registers, etc., ).

The following is sort of an abstract of its specification; these are the elements you can use currently (standarized ones and guaranteed to stay available for future code):

General-purpose Registers:

Code: Select all

$AL, $AH, $AX, $EAX, $BL, $BH, $BX, $EBX, $CL, $CH, $CX, $ECX, $DL, $DH, $DX, $EDX, $ESI, $SI, $EDI, $DI, $EBP, $BP, $ESP, $SP
_________________________
Segment registers (to assign values only through geperal-purpose registers):

$DS, $ES, $FS, $GS, $SS

_________________________
Inline assembly:

Code: Select all

asm{
  .... code block ....
}

 asm ;single asm statement instruction
        ;newline-terminated


_________________________
ASM-Like directives:

Variables:


Basic types:

db -- byte
dw -- word
dd -- doubleword
dq -- quadword


Usage of variables:

byte[labelname];
word[labelname];
dword[labelname];
qword[labelname];

byte[$EAX];
byte[$EBX];
byte[$ECX];
byte[$EDX];
byte[$ESI];
byte[$EDI];
byte[$ESP];
byte[$EBP];

word[$EAX];
word[$EBX];
word[$ECX];
word[$EDX];
word[$ESI];
word[$EDI];
word[$ESP];
word[$EBP];

dword[$EAX];
dword[$EBX];
dword[$ECX];
dword[$EDX];
dword[$ESI];
dword[$EDI];
dword[$ESP];
dword[$EBP];

...etc...


byte[$AX+somenumber];
byte[$EBX+somenumber];
byte[$CX+somenumber];
byte[$EDX+somenumber];
byte[$SI+somenumber];
byte[$EDI+somenumber];
byte[$SP+somenumber];
byte[$EBP+somenumber];

word[$EAX+somenumber];
word[$BX+somenumber];
word[$ECX+somenumber];
word[$DX+somenumber];
word[$ESI+somenumber];
word[$DI+somenumber];
word[$ESP+somenumber];
word[$BP+somenumber];

dword[$AX+somenumber];
dword[$EBX+somenumber];
dword[$CX+somenumber];
dword[$EDX+somenumber];
dword[$SI+somenumber];
dword[$EDI+somenumber];
dword[$SP+somenumber];
dword[$EBP+somenumber];



Assigning label values, addresses:

$EAX=label_or_pointer_to_variable;





Object code:

org ?
bits 16
bits 32


File inclusion:
incbin



Global definitions:
equ

_________________________
Preprocessor:

#include -- translated to %include
#define -- translated to equ


_________________________
Special operators


>>> -- rotate right
<<< -- rotate left


_________________________
Numbers

Unlike in GCC, you can use binary numbers
like 010100101010101b, very useful for bitmasks.


_________________________
C instructions


goto -- very important because it translates into
a standard jmp instruction pointing to a label or a
number for offset.


while, do-while, if, else if, switch, break, continue, among other common ones, as well as standard bitwise, logical and arithmetic operators.

_________________________
_________________________
_________________________
_________________________

An example to convert an ASCII string to a binary value (see here for some more examples of syntax in .CSM files).

Parameters (set up by the caller):

$BL takes the numeric base (2 for binary, 10 for decimal, 8 for octal, 16 for hexadecimal, etc.)

$ESI is the location of the zero-terminated string


Return values:

$EAX returns a numeric binary value, base 10, to be used normally.

Code: Select all

function /*$EAX */str2num(/*$BL numbase, $ESI strbuff*/)
{
 asm push ebx
 asm push ecx
 asm push edx
 asm push esi

 $EAX = 0;
 $ECX = 0;



 while( byte[$ESI] != 0x00 )
 {
  $BH = byte[$ESI];   ;//get the character
  $ESI++;             ;//advance string pointer


 ;//Convert ASCII to binary value:
 ;;
  if($BH >= '0' && $BH <= '9')
  $BH -= 0x30;

  else if($BH >= 'a' && $BH <= 'z')
  $BH -= (0x61-10);

  else if($BH >= 'A' && $BH <= 'Z')
  $BH -= (0x41-10);



  $CL = $BL;          ;//take base in ECX

;  $EAX *= $ECX;       ;//get result in EDX:EAX
  asm mul ecx

  $EDX = $BH;         ;//put the binary value in EDX

  $EAX += $EDX;       ;//add it to the value
  
 }


 asm pop esi
 asm pop edx
 asm pop ecc
 asm pop ebx
}

Now see how it gets translated. It looks confusing, but translating it from RealC is much easier than it looks, and when a basic compiler is finished it should be even easier:

Code: Select all

;$EAX str2num($BL numbase, $ESI strbuff)
;;
str2num:
push ebx
push ecx
push edx
push esi;


 xor eax,eax;
 xor ecx,ecx;


 .while0:
 cmp byte[esi],0
 jz .while0_end;

   mov bh,[esi];   ;//get the character
   inc esi;        ;//advance string pointer



 ;//Convert ASCII to binary value:
 ;;
  cmp bh,'0'
  jb .not_if_ln47;
  cmp bh,'9'
  ja .not_if_ln47;
    sub bh,0x30;

  jmp .if_ln47__end;
  .not_if_ln47:


  cmp bh,'a'
  jb .not_if_ln50;
  cmp bh,'z'
  ja .not_if_ln50;
    sub bh,(0x61-10);

  jmp .if_ln47__end;
  .not_if_ln50:


  cmp bh,'A'
  jb .not_if_ln53;
  cmp bh,'Z'
  ja .not_if_ln53;
    sub bh,(0x41-10);

  jmp .if_ln47__end;
  .not_if_ln53
  .if_ln47__end:



   mov cl,bl;     ;//take base in ECX
   mul ecx;       ;//get result in EDX:EAX
                  ;EAX*ECX == EDX:EAX

   movzx edx,bh;  ;//put the binary value in EDX
   add eax,edx;   ;//add it to the value


 jmp .while0;
 .while0_end:


pop esi
pop edx
pop ecx
pop ebx;
ret
Note that the use of semicolons for assembly instructions in the ASM source, while not necessary, makes it easier to distinguish between the end of single instructions or, more importantly, contiguous sets of related instructions for one same operation.

You can still use all of the instructions and directives that NASM/YASM understand inside an asm{} block or asm single-line statement.

As you can see the source code will be translated into assembly files, and thus you can further assemble and link it later with the object code from any other language that can link like this, like GCC.

This allows you to truly have full control over your low level source code and your own optimizations and to know exactly what you are doing. That's a very valuable thing, specially when you are starting to sort out the right algorithm implementations for your code. The RealC code is almost a mirror of the instructions that will be actually generated. Only loops, ifs, switches and things like that generate the necessary code to make them work.

You can even place code outside functions and you don't need a main(), so you have to be careful where you put the code to not overrun it, like in assembly.
Last edited by ~ on Sat Feb 27, 2010 9:35 am, edited 1 time in total.
Cjreek
Member
Member
Posts: 70
Joined: Thu May 28, 2009 2:41 pm
Location: Germany

Re: 16 Bit Programming Language

Post by Cjreek »

This nearly sounds like what I'm looking for! :shock: :)

I even thought about doing something like this some time before. But it's much work and I first wanted to look for something like this again.

So... Where can I download it? :)
I really need "RealC" !
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: 16 Bit Programming Language

Post by ~ »

There's no compiler yet.

By now there's just those language specifications and still need to create a basic compiler. I think I will be able to write it when I think in an efficient way to interpret the expressions of the code, maybe some months (for me it would be some months only for this project in exchange of years of too wasteful and inefficient ASM programming). That compiler is something I also need to avoid wasting too much time with routinary assembly, so it's a prioritary project for me, so I guess that this year I will be able to come up with a basic Javascript working implementation for individual files. By now I only have a stage 1 to remove all comments but nothing else (to make it more easily compilable, as suggested by some compiler books) and a stage 2 that looks for all tokens (things like numbers, keywords, multicharacter operators, nests), records them with a token type and a position and reorders it.

I think the next step is to actually, for a start at least, a brute force parser to look for determined types of predefined sequences of basic expressions to be translated to assembly and it would be a huge functional step for completion.

But at least now you have several options to choose from to achieve more clarity and you can use these ideas to make your source code easier to understand in a rather standarized way.
Cjreek
Member
Member
Posts: 70
Joined: Thu May 28, 2009 2:41 pm
Location: Germany

Re: 16 Bit Programming Language

Post by Cjreek »

~ wrote:There's no compiler yet.

:(

Do you know if there's anything like the language of your concept, but already implemented?
I'm looking for something like this for a long time...
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: 16 Bit Programming Language

Post by ~ »

Cjreek wrote:
~ wrote:There's no compiler yet.

:(

Do you know if there's anything like the language of your concept, but already implemented?
I'm looking for something like this for a long time...
Maybe C-- (here), although you might find it more complex than RealC in syntax if you are mainly an assembly programmer. C-- seems to be more towards the C side than to the low-level assembly side, but RealC is joining both C and assembly in an equilibrated and uncomplicated manner, not to mention that probably you will need to get used to C-- (and unwarranted backwards compatibility), while if you know NASM assembly you already know practically everything about RealC.

I don't know any language that is designed to make low level programming as trivial, clear and hopefuly standarized and fully backwards compatible as this one (as much as NASM sources). I'm not saying it doesn't actually exist, maybe it does.

I have uploaded what I have until now here and HTML/Javascript source code here. The implemented logical branches are able to solve the following sample, but functional, instruction:

{general_purpose_reg} << {number} {;}

And the model of this initial branch can be easily repeated for the rest of instructions like that one.


_______________________
Within 1 week I think I will be able to implement the logic branches to process inline assembly, and for expressions like $EBP << 3; $EBX = sys_argv; $EBX += $EBP; and maybe $ESI = [$EBX]; $ECX = [$EBX+4]; among other bitwise and arithmetic "binary" operators. Thus you would still be able to use a list of supported expressions (which would be published) mixed with inline assembly. Then you will be able to ease a little bit your assembly programming and still update and/or recompile without changes those sources for future compilers.

Or you could help me further get this work done (and the compiler finished and fully functional) by giving me some brief assembly or standard C snippets featuring what kind of operations you need to do so as I have something to solve and implement, or giving me some RealC source code or feature requests (you can see sample .CSM source files in my site previouly posted), suggestions and testing, and I could help you pointing out syntax errors, I could translate them for you to use and at the same time I would be implementing these features in the automatic compiler which would be available to download.

In this way, in 1 week you would have what I already talked about, and maybe in 1 month you could already use switches, ifs or loops, and could optionally update or create new and more efficient source files without disrupting anything, you could even write the code for unimplemented features if you want to and don't want to wait too much (again there are the examples and you can ask and it will be further documented to make for a more complete software).

That's another advantage of this language -- you can easily translate it manually if you need to, but the ultimate intention is to get the automatic product done through this gained experience. I already have the algorithms in paper, to code basic ifs, whiles, ands, etc., and can discuss them as much as needed, you can learn them (which will improve your assembly readily), and I just need to find the proper ways to implement them, mainly things like ifs and loops. The calls to routines are already very easy to do (they don't have specific type nor parameters, but you can adjust them just like in assembly, only a little easier).

At the same time you won't distract from your true goal of actually writing something while the compiler gets more and more finished along the way.


__________
By the way, the operations above should be something like:

$EBP <<= 3;

and $EBP << 3; would be wrong in this context, unless it can be proved that it can be used as a shorthand operation for simple shifts as well as for nesting capability which doesn't really exist implemented yet (it should be pointed out in the documentation). But I would prefer to look it as an "error" (so that the compiler engine itself keeps being as simple and straightforward as possible and with unambiguous operations).
Last edited by ~ on Sat Feb 27, 2010 1:58 pm, edited 4 times in total.
User avatar
i586coder
Member
Member
Posts: 143
Joined: Sat Sep 20, 2008 6:43 am

Re: 16 Bit Programming Language

Post by i586coder »

well friends, i have a good experience about 16 bit programming and 16-bit compilers in fact, in your case using turbo c to produce .com files needs tiny memory model for that - refer to wiki for more details - , or you can use digital mars compiler ; its a powerfull 16 bit compiler around.

in closing, 16 bit programming is a kind of killing especially converting (casting) in math, except if you have a good reason to code in 16 bit compiler.

good luck;

a.T.d
Distance doesn't make you any smaller,
but it does make you part of a larger picture.
Post Reply