Implementing non-English language in OS
Implementing non-English language in OS
My friend tells me that my OS is more interested by Russians, so I need to implement the Russian language in it. I don't agree - who need Russian in alpha-stage OS? So, the question is - who I need to listen - me or my friend?
Developing U365.
Source:
only testing: http://gitlab.com/bps-projs/U365/tree/testing
OSDev newbies can copy any code from my repositories, just leave a notice that this code was written by U365 development team, not by you.
Source:
only testing: http://gitlab.com/bps-projs/U365/tree/testing
OSDev newbies can copy any code from my repositories, just leave a notice that this code was written by U365 development team, not by you.
Re: Implementing non-English language in OS
Listen to yourself.
Re: Implementing non-English language in OS
OK, thanks.
Developing U365.
Source:
only testing: http://gitlab.com/bps-projs/U365/tree/testing
OSDev newbies can copy any code from my repositories, just leave a notice that this code was written by U365 development team, not by you.
Source:
only testing: http://gitlab.com/bps-projs/U365/tree/testing
OSDev newbies can copy any code from my repositories, just leave a notice that this code was written by U365 development team, not by you.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Implementing non-English language in OS
It can be a pain to introduce internationalisation later in development if precautions aren't taken early, and the alpha stage might be a good moment to try it out. But in the end it's just as iansjack says: Your project, your rules.
Psst, you can also look at it differently: it's worth several invisible tokens to beat the master on a niche subject and be really bilingual - who here can say they have better user support than for instance Brendan's OS? ; ) - of course, only if you're interested
Psst, you can also look at it differently: it's worth several invisible tokens to beat the master on a niche subject and be really bilingual - who here can say they have better user support than for instance Brendan's OS? ; ) - of course, only if you're interested
Re: Implementing non-English language in OS
I have a UTF-8 and Russian fonts now, but the second isn't used, and the first is very glitchy.
Developing U365.
Source:
only testing: http://gitlab.com/bps-projs/U365/tree/testing
OSDev newbies can copy any code from my repositories, just leave a notice that this code was written by U365 development team, not by you.
Source:
only testing: http://gitlab.com/bps-projs/U365/tree/testing
OSDev newbies can copy any code from my repositories, just leave a notice that this code was written by U365 development team, not by you.
Re: Implementing non-English language in OS
I think if you intend to support foreign character sets and localization, you should at least keep it in mind while designing your code. Maybe make a framework to support this stuff and then just only write an English implementation for now? This way you can always implement Russian when you feel like it.
My blog: http://www.rivencove.com/
Re: Implementing non-English language in OS
If you're not really careful, the problem with that is that you'll only notice the problems in your framework when it's too late. Implementing only one language means that you're prone to write your code as if other languages were just English with different words. But they aren't.dseller wrote:I think if you intend to support foreign character sets and localization, you should at least keep it in mind while designing your code. Maybe make a framework to support this stuff and then just only write an English implementation for now? This way you can always implement Russian when you feel like it.
Re: Implementing non-English language in OS
It's done a long time ago.dseller wrote:I think if you intend to support foreign character sets and localization, you should at least keep it in mind while designing your code. Maybe make a framework to support this stuff and then just only write an English implementation for now? This way you can always implement Russian when you feel like it.
Developing U365.
Source:
only testing: http://gitlab.com/bps-projs/U365/tree/testing
OSDev newbies can copy any code from my repositories, just leave a notice that this code was written by U365 development team, not by you.
Source:
only testing: http://gitlab.com/bps-projs/U365/tree/testing
OSDev newbies can copy any code from my repositories, just leave a notice that this code was written by U365 development team, not by you.
Re: Implementing non-English language in OS
I picked the name for my OS project to be specifically impossible to write until UTF8 support was properly added. It's Rødvin. That said, I've also implemented UTF8 support & drawing them, so that everything will be supporting any non-English UTF8 language.
Re: Implementing non-English language in OS
There is more to non-English languages than just rendering UTF-8 glyphs- layout and shaping can get pretty complex.
Re: Implementing non-English language in OS
Right-to-left and bidirectional writing. Characters that need to be larger than latin glyps. Different digit separators, date string formats, currency formats. Glyphs being digits but not being in the 0-9 range. Strings taking much more space than in English. Combining characters. The list goes on.
Every good solution is obvious once you've found it.
-
- Member
- Posts: 1146
- Joined: Sat Mar 01, 2014 2:59 pm
Re: Implementing non-English language in OS
And of course with multi-language support your OS also needs to have a proper localisation framework.Rusky wrote:There is more to non-English languages than just rendering UTF-8 glyphs- layout and shaping can get pretty complex.
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.
Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
- bellezzasolo
- Member
- Posts: 110
- Joined: Sun Feb 20, 2011 2:01 pm
Re: Implementing non-English language in OS
From my experience, I used UTF-16 as the internal character format, as that eased many of the issues. I used a simple Unicode bitmap font, freely available (I converted it to C format with a utility). Then, you want to load language packs. My approach was to place one in the initrd, and the kernel used that. The format doesn't have to be complex, you could just have a list of strings, each separated by a newline to begin with (you will probably want to move onto things like XML later though).
If you're mainly aiming at supporting Russian, RTL isn't so important. But, since I was attempting Hebrew, this was the nightmare bit. Enjoy!
Needless to say, this is all needing a graphics mode.
Of course, there is a difference between a language pack and a good language pack. Generally, you will want the strings to be complete sentences so you don't get a complete grammatical screw up.
Hope this helps.
If you're mainly aiming at supporting Russian, RTL isn't so important. But, since I was attempting Hebrew, this was the nightmare bit. Enjoy!
Needless to say, this is all needing a graphics mode.
Of course, there is a difference between a language pack and a good language pack. Generally, you will want the strings to be complete sentences so you don't get a complete grammatical screw up.
Hope this helps.
Whoever said you can't do OS development on Windows?
https://github.com/ChaiSoft/ChaiOS
https://github.com/ChaiSoft/ChaiOS
Re: Implementing non-English language in OS
You can encode all languages efficiently in UTF-8, and even SQLite3 supports it fully (that's why there are so many end-user programs that make internal use of SQLite3).
You could port SQLite3 to your OS to add indexing, query and even "registry" capabilities for installed programs, configuration values, etc.
I prefer to use UTF-8. It's widely supported in the Web and that's why I should learn to encode it with my own code. UTF-16 as well.
I would recommend you to use a database engine like SQLite3 and make a database that contains all words in all languages. Then put all synonyms, antonyms, paronyms, combinations, etc., in one same row in all languages. It will help you greatly in searches and indexing (will help you create automatic translations even for the GUI of your programs, to search more efficiently the documentation and code, etc.). It will make possible to search in one language, or search a word, and find what you searched in all languages and in all related variants of the word, and for all of its synonyms, antonyms, etc:
This is a simple table definition for that:
This is sample text of how to index all words in all languages in one same row (text line). Note how each word has a header between || which contains the language ID and the classification of the word (synonym, antonym, name, etc...). The numbers like 50 are the percentage of positive or negative emotions for each word by default that I felt when I wrote the text, but could be updated with A.I., and the last word like "skill" is an attempt to classify the words from the most basic existent human concept, to the most complex and emotive/subjective one, but is optional... all parameters after the language ID are meant to be optional and parsed for their presence:
Remember that the effect of relating the same word and its synonyms/antonyms/etc., in one same row/record/register in all existing languages (including typos, abbreviations and phrases) makes you find and search more in terms of the core concepts of the words, more than search, find and process for a specific word itself.
It's a very good basic A.I. filter for understanding and processing natural language but it needs a massive database containing ALL existing words in human kind related (one same word in all languages===one database record).
Why hasn't even Google released such a vital language database?
You could port SQLite3 to your OS to add indexing, query and even "registry" capabilities for installed programs, configuration values, etc.
I prefer to use UTF-8. It's widely supported in the Web and that's why I should learn to encode it with my own code. UTF-16 as well.
I would recommend you to use a database engine like SQLite3 and make a database that contains all words in all languages. Then put all synonyms, antonyms, paronyms, combinations, etc., in one same row in all languages. It will help you greatly in searches and indexing (will help you create automatic translations even for the GUI of your programs, to search more efficiently the documentation and code, etc.). It will make possible to search in one language, or search a word, and find what you searched in all languages and in all related variants of the word, and for all of its synonyms, antonyms, etc:
This is a simple table definition for that:
Code: Select all
CREATE TABLE multilanguage_words(wordlist TEXT DEFAULT "", dictionary_definition TEXT DEFAULT "", rowid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL);
pragma encoding="UTF-8";
.mode tabs
.import multilanguage_words.txt multilanguage_words
This is sample text of how to index all words in all languages in one same row (text line). Note how each word has a header between || which contains the language ID and the classification of the word (synonym, antonym, name, etc...). The numbers like 50 are the percentage of positive or negative emotions for each word by default that I felt when I wrote the text, but could be updated with A.I., and the last word like "skill" is an attempt to classify the words from the most basic existent human concept, to the most complex and emotive/subjective one, but is optional... all parameters after the language ID are meant to be optional and parsed for their presence:
Code: Select all
{:synonym:es:50:skill}programador+{:synonym:en}programmer+{:synonym:fr}programmeur+{:synonym:it}programmatore+{:synonym:eo}programisto {:synonym:es}Persona que diseña los procedimientos a seguir por un dispositivo automatizado.
{:synonym:es:80:quality}listo+{:synonym:en:70}smart+{:homonym:es}listo,{:synonym:es}sagaz,{:synonym:es}astuto,{:synonym:es}ducho,{:synonym:es}despabilado,{:synonym:es}avivado,{:synonym:es}avezado+{:typo:es}avesado+{:synonym:es}avispado,{:synonym:es}perspicaz,{:synonym:es:75}vivo {:synonym:es}Persona con gran agudeza y agilidad mental y práctica.
{:synonym:es:60:status}listo+{:synonym:es}ready,{:synonym:es}preparado+{:synonym:es}prepared,{:synonym:es}dispuesto+{:synonym:en}willing,{:synonym:es}complaciente {:synonym:es}Estado de espera y disposición para llevar a cabo una tarea.
{:synonym:es:50:concept}palabra,{:synonym:es}fonema,{:synonym:es}vocablo,{:synonym:es}término,{:synonym:es}verbo,{:synonym:es}dicción,{:synonym:es}expresión,{:synonym:es}lengua,{:synonym:es}lenguaje,{:synonym:es}habla,{:synonym:es}promesa,{:synonym:es}pacto,{:synonym:es}oferta,{:synonym:es}juramento,{:synonym:es}ofrecimiento,{:synonym:es}compromiso {:synonym:es}Elemento de todo lenguaje que comunica ideas, intenciones y acciones.
{:synonym:es:-60:action}desaparecer+{:synonym:es:-35:action}desaparecerse,{:synonym:es}esfumar+{:synonym:es}esfumarse,{:synonym:es}retirar+{:synonym:es}retirarse {:synonym:es}Alejar algo de nuestra percepción de modo que no se pueda encontrar.
{:name:es:0:male}Rodolfo+{name:en:0:male}Rudolph
{:synonym:*:medication}Panadol+{:synonym:*:medication}Paracetamol
{:synonym:en:65}conversely+{:synonym:es:65}al contrario de+{:synonym:es:65}a diferencia de
{:synonym:es}amalgama+{:synonym:en}amalgam+{:synonym:es}amalgamation
{:synonym:es}natural+{:synonym:en}natural,{:synonym:es}sincero+{:synonym:en}sincere,{:synonym:es}espontáneo+{:synonym:en}spontaneous,{:synonym:es}genuino+{:synonym:en}genuine
{:synonym:es}calibración+{:synonym:en}calibration+{:synonym:es}calibrar+{:synonym:en}calibrate,{:synonym:es}equilibrio+{:synonym:es}equilibrar+{:synonym:en}equilibrium+{:synonym:en}equilibrate,{:synonym:es}balance+{:synonym:es}balancear+{:synonym:en}balance
{:surname:en}Sonnenreich+{:typo:en}Sonnereich
{:synonym:es}correspondiente+{:synonym:es}coincidente+{:synonym:es}concordante+{:synonym:pt}concorda+{:synonym:pt}concorde+{:synonym:es}+que concuerde+{:synonym:es}acierto+{:synonym:es}coincidencias+{:synonym:es}concordancias+{:synonym:es}acierto+{:synonym:es}aciertos+{:synonym:pt}de acordo+{:synonym:pt}concerta+{:synonym:pt}concertar+{:synonym:pt}concerte
{:synonym:es}endurar+{:synonym:es:0:verb}endurecer+{:synonym:es:0:verb}endurezco+{:synonym:es:0:verb}endureces+{:synonym:es:0:verb}endurece+{:synonym:es:0:verb}endurecemos+{:synonym:es:0:verb}endurecéis+{:typo:es:0:verb}endureceis+{:synonym:es:0:verb}endurecen+{:synonym:en:0:verb}harden+{:synonym:en}hard+{:synonym:en:0:verb}make hard+{:synonym:en:0:verb}to make hard+{:synonym:en:0:verb}make it hard+{:synonym:en:0:verb}making it hard{:synonym:es}endura+{:synonym:es}endurece+{:synonym:es}durar
{:name:*:0:font-face}Calibri
{:synonym:en}keep+{:synonym:en}keeping+{:synonym:en}kept+{:synonym:es:0:verb}mantener+{synonym:es}mantén+{:typo:es}manten+{:synonym:es}mantengo+{:synonym:es}mantienes+{:synonym:es}mantiene+{:synonym:es:0:verb}mantienen+{:synonym:es}mantenemos+{:synonym:es}mantenéis+{:typo:es:verb}manteneis+{:synonym:es}mantén
{:word:es}tu+{:word-plural:es}tus+{:word:en}your
{:name:en:50:organism}eye+{:name-plural:en:50:organism}eyes+{:name:es:50:organism}ojo+{:name-plural:es:50:organism}ojos
{:name:es:100:math}Álgebra+{:name:en:100:math}Algebra
{:name:es:100:math}Aritmética+{:name:en:100:math}Arithmetic
{:name:es:100:math}Cálculo+{:name:en:100:math}Calculus
{:name:en:100:artificial-intelligence}Situation Calculus+{:name:es:100:artificial-intelligence}Cálculo Situacional
{:name:en}dynamical domain+{:name:en}dominio dinámico+{:name:en}dynamical domains+{:name:en}dominios dinámicos
{:name:en}vedic math+{:name:es}matemática védica
{:name:en}Pizza Hut
{:name:en}Toto's Pizza
{:word:es}tan+{:word:en}as+{:word:es}tanto
{:word:es}también+{:typo:es}tambien+{:chat:es}tmb+{:word:en}as well+{:word:en}as well as
{:synonym:en:0:verb}close+{:antonym:en:0:verb}open+{:synonym:en}closed+{:synonym:es}cerrado+{:antonym:es}abierto+{:synonym-plural:es}cerrados+{:antonym-plural:es}abiertos
{:word:en}and+{:word:es}y+{:word:pt}e+{:word:fr}et
{:word:en}of+{:word:es}de
{:word:en}you+{:word:es}tú
{:word:en}state+{:word:es}estado+{:word-plural:en}states+{:word-plural:es}estados
{:name:en}day+{:name:es}día+{:name-plural:en}days+{:name-plural:es}días
{:synonym:en}ensure+{:synonym:en}make sure+{:synonym:en}making sure+{:synonym:es}asegurándose+{:synonym:es}asegurar+{:synonym:es}asegurarse
{:synonym:en}high+{:synonym-male:es}alto+{:synonym-female:es}alta+{:synonym-plural-male:es}altos+{:synonym-plural-female:es}altas
{:synonym:en}level+{:synonym-plural-male:en}levels+{:synonym-male:es}nivel+{:synonym-plural:es}niveles+{:synonym:es}nivelación+{:synonym-plural:es}nivelaciones
{:synonym-male:en}channel+{:synonym-plural-male:en}channels+{:synonym-male:es}canal+{:synonym-plural-male:es}canales
{:word:en}the+{:word-male:es}el+{:word-female:es}la+{:word-female:es}las+{:word:es}lo+{:word-plural:es}los
{:word:en}to+{:word:es}para+{:word:es}a
{:synonym:en:0:verb}deepen+{:synonym:es:0:verb}profundizar
{:synonym:en}least+{:synonym:es}menos+{:synonym:es}menor
Remember that the effect of relating the same word and its synonyms/antonyms/etc., in one same row/record/register in all existing languages (including typos, abbreviations and phrases) makes you find and search more in terms of the core concepts of the words, more than search, find and process for a specific word itself.
It's a very good basic A.I. filter for understanding and processing natural language but it needs a massive database containing ALL existing words in human kind related (one same word in all languages===one database record).
Why hasn't even Google released such a vital language database?
YouTube:
http://youtube.com/@AltComp126
My x86 OS/software:
https://sourceforge.net/projects/api-simple-completa/
Donate to get more food/programming resources/computers:
https://www.paypal.com/donate/?hosted_b ... QS2YTW3V64
http://youtube.com/@AltComp126
My x86 OS/software:
https://sourceforge.net/projects/api-simple-completa/
Donate to get more food/programming resources/computers:
https://www.paypal.com/donate/?hosted_b ... QS2YTW3V64
- max
- Member
- Posts: 618
- Joined: Mon Mar 05, 2012 11:23 am
- Libera.chat IRC: maxdev
- Location: Germany
- Contact:
Re: Implementing non-English language in OS
This database structure is terrible. Putting all translations to one word in one row is exactly how you should not do it. A database must be properly normalized so you can effectively work with it, index it and search through it.~ wrote:Then put all synonyms, antonyms, paronyms, combinations, etc., in one same row in all languages. It will help you greatly in searches and indexing (will help you create automatic translations even for the GUI of your programs, to search more efficiently the documentation and code, etc.). It will make possible to search in one language, or search a word, and find what you searched in all languages and in all related variants of the word, and for all of its synonyms, antonyms, etc:
This is a simple table definition for that:[...]Code: Select all
CREATE TABLE multilanguage_words(wordlist TEXT DEFAULT "", dictionary_definition TEXT DEFAULT "", rowid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL); pragma encoding="UTF-8"; .mode tabs .import multilanguage_words.txt multilanguage_words
Why hasn't even Google released such a vital language database?
You need more than a massive database to do natural language processing. Why should Google release a giant file that is basically only a dictionary? Google has it's AI that properly translates from/to a lot of languages and always learns new stuff. Natural languages are very complex, and the algorithms to process them are as well.