Non-PC OS Development and Different Character Sets

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
User avatar
SpyderTL
Member
Member
Posts: 1074
Joined: Sun Sep 19, 2010 10:05 pm

Non-PC OS Development and Different Character Sets

Post by SpyderTL »

I'm taking a break from x86 development to work on an "as compatible as possible" OS for the C64.

I wanted to clear up as much available RAM as possible, so the first thing I'm currently doing is disabling ALL of the ROM bank overlays, so that the CPU has access to the full 64k of RAM. This means that I have to (or "get to") write my own kernel routines for the video display, keyboard handler, disk drive I/O, etc.

So, of course, my first task is to read events from the keyboard and write characters to the screen. My first approach was to create a table mapping the keyboard logical positions to the default screen character set. This works fairly well for a simple proof-of-concept, but now I'm starting to think about handling string data in memory, and rendering embedded text to the screen.

Now, the problem is that the default character set does not line up with the ASCII character set. The lower and upper case characters are "swapped", and the other symbols are scattered about, meaning that a simple formula to convert from one to the other isn't really an option.

So, the question is, how do (or would) you guys handle the situation where you are working on different platforms that have different keyboards, and the default screen character map is different?

I noticed that the C64 emulator that I'm using (WinVICE, I think), has a built in Debug Monitor, but it displays characters in memory as if they are ASCII encoded. So viewing the C64 video memory RAM does not match the characters on the screen. Also, when using the cc65 compiler, I noticed that when I put string literals into the source file, they ended up being case inverted on the screen, so it appears to have the same "problem" that I have in my own compiler.

So, should I just pick an encoding, and convert everything from/to that at run time? Or should I add the functionality to my compiler to allow me to specify what "encoding" I want strings to be output in the binary file?

And what about (and yes I'm being serious...) communicating on the internet from a C64? If I use ASCII as my "native" memory string format, that would make communication over the internet easier, and it would allow me to use the same compiler logic as other platforms as well.

I've also just started looking into the disk drive format to see how it is encoded, but I'm sure that it is going to be the same PETSCII encoding that the C64 uses (which is different from the default character set, btw.)

Also, and I'm pretty sure I know what the answer is going to be, but if I end up storing text as ASCII in memory, should I reprogram the character set to be an ASCII character set as well?

What do you guys think?

Thanks.
Project: OZone
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
User avatar
Geri
Member
Member
Posts: 442
Joined: Sun Jul 14, 2013 6:01 pm

Re: Non-PC OS Development and Different Character Sets

Post by Geri »

i would handle it with an unsigned char charsetx86equivalent[256]={45,6,2,6,2,whatever,};

about the rest of your topic, i alreday told it to somebody just today http://forum.osdev.org/viewtopic.php?f= ... 44#p278144
Operating system for SUBLEQ cpu architecture:
http://users.atw.hu/gerigeri/DawnOS/download.html
Octocontrabass
Member
Member
Posts: 5568
Joined: Mon Mar 25, 2013 7:01 pm

Re: Non-PC OS Development and Different Character Sets

Post by Octocontrabass »

SpyderTL wrote:So, the question is, how do (or would) you guys handle the situation where you are working on different platforms that have different keyboards, and the default screen character map is different?
I'd handle everything internally as Unicode (probably UTF-8), and translate the encoding as necessary.

I think all of my target platforms are more powerful than a C64.
SpyderTL wrote:Also, when using the cc65 compiler, I noticed that when I put string literals into the source file, they ended up being case inverted on the screen, so it appears to have the same "problem" that I have in my own compiler.
It sounds like you're doing run-time PETSCII to character ROM layout translation without cc65's compile-time ASCII to PETSCII translation.
SpyderTL wrote:Or should I add the functionality to my compiler to allow me to specify what "encoding" I want strings to be output in the binary file?
Incidentally, cc65 allows you to be very specific about the translation.
SpyderTL wrote:Also, and I'm pretty sure I know what the answer is going to be, but if I end up storing text as ASCII in memory, should I reprogram the character set to be an ASCII character set as well?
Maybe. Do you have the RAM to spare for that? If not, you're better off translating from ASCII (or PETSCII or whatever in-memory format you choose) to the character ROM layout.
User avatar
SpyderTL
Member
Member
Posts: 1074
Joined: Sun Sep 19, 2010 10:05 pm

Re: Non-PC OS Development and Different Character Sets

Post by SpyderTL »

Octocontrabass wrote:Maybe. Do you have the RAM to spare for that? If not, you're better off translating from ASCII (or PETSCII or whatever in-memory format you choose) to the character ROM layout.
I think the Character Data RAM is actually a separate overlay that is switched in and out with the I/O bank, so it doesn't really use up any system RAM once it has been loaded and the bank has been disabled. But, I'll have to store it somewhere on cartridge, or disk, until it is loaded into Character RAM.

Or, I could write code that will just "move" the characters in Character RAM from their default VIC-II locations to their ASCII locations.

For now, I've just decided to store everything as ASCII and look up the VIC-II default characters in a table. I did one version that stored everything as PETSCII codes, and then looked up the VIC-II default characters in a table, and then I copied that and swapped out all of the PETSCII values with ASCII values, and it worked like a charm. And the WinVICE monitor window now matches what is in memory, which helps.

It should be good enough for now, and fairly easy to just get rid of the VIC-II lookup table if I ever decide to reprogram the Character RAM.

EDIT: Nope. After looking it up, the character set is stored in a ROM overlay, so it would take up 2k of system RAM to use my own character set, or reorganize the existing characters into their corresponding ASCII positions. I was thinking about the Color RAM, which is in it's own 1k RAM overlay...
Project: OZone
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
Post Reply