Implementing non-English language in OS

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: Implementing non-English language in OS

Post by ~ »

max wrote:
~ wrote:Then put all synonyms, antonyms, paronyms, combinations, etc., in one same row in all languages. It will help you greatly in searches and indexing (will help you create automatic translations even for the GUI of your programs, to search more efficiently the documentation and code, etc.). It will make possible to search in one language, or search a word, and find what you searched in all languages and in all related variants of the word, and for all of its synonyms, antonyms, etc:

This is a simple table definition for that:

Code: Select all

CREATE TABLE multilanguage_words(wordlist TEXT DEFAULT "", dictionary_definition TEXT DEFAULT "", rowid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL);
pragma encoding="UTF-8";

.mode tabs
.import multilanguage_words.txt multilanguage_words

[...]

Why hasn't even Google released such a vital language database?
This database structure is terrible. Putting all translations to one word in one row is exactly how you should not do it. A database must be properly normalized so you can effectively work with it, index it and search through it.

You need more than a massive database to do natural language processing. Why should Google release a giant file that is basically only a dictionary? Google has it's AI that properly translates from/to a lot of languages and always learns new stuff. Natural languages are very complex, and the algorithms to process them are as well.
There are things that in A.I. need to be specifically taught to the machine, giving it specifically concepts and information that a human has made sure that is optimal and that is that he/she would use. Maybe that's what publicly available A.I. is lacking (getting people to give it all data clearly in the format that it expects instead of trying to learn and deduct it all, being a machine). Probably it would result in actually more intelligent machines, and programs that no longer talk and sound as if they were reading spam. But it really needs people to massively describe every human aspect in an usable manner to an A.I., and make sure that each major knowledge component -like this special multi-language database- is the simplest possible and formatted in a standard way for all A.I.s.

Such database can immediately help you detect the language of a document without making mistakes. Anyway the brain also contains all words specifically in one way or another, so the computer should have it too to be nearer to our intellectual/verbal capability level. You'll see what happens when I get to distinguish another fundamental component for filtering natural language and use this database and other simple components together.

Don't think that the header format used for each word isn't well-meditated on. That format won't let you confuse the header for the word, and the fact that you query the database for a LIKE "}"+word or LIKE "}"+word+"+" statement will allow to effectively find only actual words instead of metadata, and then you can count the occurrences of words and count how many times the LANGUAGE ID appears for present words, even detect the percentage of languages written.

I still work based on the simplest concepts, and it's so obvious that such database could help you find many more things with a single search, like you search for juice and it will find documents that don't contain that word but that could contain "drink", "drinks" and would always be found because those are synonyms.

ALSO GROUP DIRECT SYNONYMS WITH +

It's a very good language filter considering that if you have all the words for a same word in all languages (and related variants and antonyms/synonyms), and you know to which language every word belongs, you stop looking at the natural language processing problem upside down, filter it and end up with sort of an ID for a single unique concept in return for a word, and if you do that with all language, it would be much easier to start from there and look at a document in many different perspectives, both programmed and deducted with A.I.


More than hearing complaints about the structure of a database, I'd like to hear a better and more complete implementation than the one I wrote above, for such a language simplification engine that allows to center in concepts no matter the words or language used in a program or a text search or indexation. If there is no such explanation but only the complain, it might only be an attempt of an intention to have that resource done immediately and tested in real programs, maybe, with a more personal style, which is most probably currently undefined and the brain of the other person is currently trying to formulate it at least in the deepest background.

It's a very good structure. If you look, it even has an automatic auto-incrementing ID field at the very end of the records of a table. What it means is that you can input values to that table at any time randomly, conveniently, without having to specify an ID that it's dummy anyway but that will help index internally at runtime. That field no longer gets in the way of actual data, since it's at the end of the records and then it doesn't need to fill a void in a CSV file.


The effect of putting all words in all languages in a single row is that you see at language with the perspective of its language-independent concepts.

But you have to make sure that you have absolutely all existing words in record, one word with its synonyms, antonyms, male/female counterparts, etc., etc., etc. Humanly, it would also help learn any other language. But this database structure is ideal for being able to practically find all words for a single-word concept.

It might be bad for, say, maybe, a default standarized generic case among thousands for a database table, but it serves well a particular database that doesn't seem to exist and so makes computer much dumber in terms of human language. Remember that you have to effectively communicate even the most obvious human tidbit of reasoning and knowledge to machine with software. I still can't believe that there isn't a properly-formatted database available publicly that contains all human language classified by the same word and variants and antonyms, etc., in a single row for being able to access concepts.

But it would be an incredibly valuable product because it would have human grade, and would be about something that will never disappear or become nonstandard given that it is purely about something human.

As you can see, it looks like that when you add absolutely everything that exists about some knowledge or human capability like human language, in a format that is usable by a machine and that is human-readable and easy to understand and update/maintain, you actually get the capability to implement that human-grade skill or sense, you get to synthesize and make it useful in practice in a machine.




I guess that it must be a joke. Human language doesn't have a rigid structure so a database structure must be chosen that will always detect concepts in any order or with any level of informality. Shouldn't be done according to what objective? If you want to translate human language into words, you certainly need all existing words to detect that concept. In this way no matter what language or synonym is used, you will always find what is being referenced and will always understand what is being said at the word level.

Then you can start classifying what was said or written by giving more importance to the most unique distinctive words like names. Also the first things that are mentioned would be the ones that would define the topic of a document the most, and the ones at the end too but maybe in less percentage or in a different perspective.

Then you can classify the rest of the words in the order of occurrences. Then use verbs to try to detect tasks as to what is being described to be done in a document, be it practical or theoretical in content.


This is the most concise way to turn a word into a concept in programming. It would really result in dramatically more results in searches, for example.

With this, now you only need a SELECT wordlist FROM multilanguage_words WHERE wordlist LIKE $_your_word_$

Now with a single searched or translated word, you only need to search and parse the headers and the words you got.
Octocontrabass
Member
Member
Posts: 5591
Joined: Mon Mar 25, 2013 7:01 pm

Re: Implementing non-English language in OS

Post by Octocontrabass »

~ wrote:Don't think that the header format used for each word isn't well-meditated on. That format won't let you confuse the header for the word, and the fact that you query the database for a LIKE "}"+word or LIKE "}"+word+"+" statement will allow to effectively find only actual words instead of metadata, and then you can count the occurrences of words and count how many times the LANGUAGE ID appears for present words, even detect the percentage of languages written.
You shouldn't need to make such queries in the first place! Metadata belongs in separate rows from the words themselves, and may even belong in separate tables from the words.

The point of a database is to be able to select exactly the data I want and absolutely nothing else. If I want a list of Spanish synonyms for an English word, I don't need words that aren't Spanish, I don't need definitions, and I certainly don't need antonyms. The database should be using a structure that lets me filter out irrelevant information. If the database cannot remove irrelevant information from its results, the database structure is wrong.
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: Implementing non-English language in OS

Post by ~ »

Octocontrabass wrote:
~ wrote:Don't think that the header format used for each word isn't well-meditated on. That format won't let you confuse the header for the word, and the fact that you query the database for a LIKE "}"+word or LIKE "}"+word+"+" statement will allow to effectively find only actual words instead of metadata, and then you can count the occurrences of words and count how many times the LANGUAGE ID appears for present words, even detect the percentage of languages written.
You shouldn't need to make such queries in the first place! Metadata belongs in separate rows from the words themselves, and may even belong in separate tables from the words.

The point of a database is to be able to select exactly the data I want and absolutely nothing else. If I want a list of Spanish synonyms for an English word, I don't need words that aren't Spanish, I don't need definitions, and I certainly don't need antonyms. The database should be using a structure that lets me filter out irrelevant information. If the database cannot remove irrelevant information from its results, the database structure is wrong.
But if you don't even mention the start of another actual algorithm that I can implement and play with, instead of just complaining for the file format to store the word data, I will be convinced that you probably hadn't thought about this problem in the past or how to solve it (translate human language in a single pass to concepts the machine can use and classify).

You can use SELECT and LIKE, then use string split function on the specially-formatted text row. It isn't a random format, it's well-thought, and being that we will be dealing with all of the text available, it's the most practical way. A single query finds all of the words, and with that functionality you get the words, but more importantly, you get a virtual ID for a concept that you can use to event interpret a document and have a very tiny program tell you what it is about (possible topics in order of importance, and what is mentioned first is more likely the main topic of the text).

It would require too many tables in any other way. The intention here is to select the concept of a word. You will certainly need to group together all of the related words in a row in all languages. Look at it as the actual code to a human concept from the string of words, that a machine can finally start understanding and using in real human tasks that are extremely important, at its own level of simplicity. There's no need to fragment the code again (it already is in the different languages, but can be put together for a vital A.I. use that is probably already done so privately) just to satisfy a naive database structure. It's much more important being able to learn how to handle Natural Human Data and satisfy its needs, complexity, informality and the structure it really has, even if it looks ugly to traditional programmers (being a human component it will not look very technical in one way or another but it needs to be handled as is in an optimum way, or we will get away again from being able to directly handle human expression with computers and will get to a technical yet naive context which is useless in this task at this level).

How can you tell a machine concisely what a word is or to which language it belongs to? The best way is to use a header, as if it was an operand for an opcode (as if each word was an Assembly instruction and you were programming it).

The fact is that you need to promote human language informality from the start, and this is the easiest way.

And by the way, I see that for each concept, we would need a way to transmit effectively and practically a set of instructions to apply such concept in actions more than theoretical virtual concepts. We just need to find the correct format.

As for words, a database that you can query for a word and that in exchange returns all related word in all languages seems to be the way to communicate, make humanly useful, and translate and turn words into raw concepts and to its performable tasks. The machine would need to be taught everything with human quality in a way where it can replicate it based on knowledge and general/broad patterns, that are actually easy to exchange with normal people. That technology will no doubt be developed at a simple level that can be made complex with automatic A.I. layers.


It could help you lear other languages by exposing you contantly to those words along with your own language and the ones you understand.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Implementing non-English language in OS

Post by Brendan »

Hi,

Once upon a time humans couldn't figure out how various things could be done, and so they just called these things "magic". As science progressed, people understood more, and things that were considered "magic" became physics, or chemistry, or biology, or some other branch of science with well defined rules.

"AI" is the software developer's equivalent of "magic". When someone says "Let's use AI for internationalisation" what they really mean is they don't have a clue how it'd actually be implemented. If they ever figure out what they're talking about, it suddenly ceases to be AI and becomes a well defined algorithm (e.g. possibly consisting of a calculating probabilities and using a brute force search to find the most probable).

Of course "magic" does have 2 meanings. It can also mean things like slight of hand, where the goal is to deceive/trick people into believing nonsense. In the same way "AI" also has 2 meanings - it can also be used for marketing purposes, where the goal is to deceive/trick people into believing nonsense.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Implementing non-English language in OS

Post by Rusky »

...which is why the people who are actually working on AI in its original sense have started using phrases like "machine learning" or "data mining." There's tons of real stuff there besides incompetence or marketing buzzwords.
User avatar
DavidCooper
Member
Member
Posts: 1150
Joined: Wed Oct 27, 2010 4:53 pm
Location: Scotland

Re: Implementing non-English language in OS

Post by DavidCooper »

I can't see the advantage of having all languages in one single database unless you're dealing with documents written in a mixture of languages where each word is chosen from a randomly selected language just for the hell of it. There are over 4000 languages out there, and AGI will soon be able to help people make up thousands more (for private conversations with machines), so do you really want to fill the machine with a gigabyte or more of unnecessary bloat which will likely take tens or hundreds of times longer to process for each word than a small language-specific database? In the real world, you typically only need one or two languages in the system at a time.

To identify languages, you can simply use a small file containing the most common hundred words of all known languages (unless you're dealing with a short line of text with no common words in it, but even then you can try loading in databases for the most likely languages first based on how common they are, the writing system used for the text or various other clues that will likely be there such as the source).

Where you're right though is that you do want to convert all the words that mean the same thing to the same coding and for that coding to be used across all langauges, though it's hard to do it properly due to subtle differences and overlaps in meaning which mess things up, so you'll find you need to convert everything to pure concepts, and in many cases there are no exact words for those pure concepts in any spoken language. It's really a task that's best left for AGI to do: our job is to program AGI to handle just one language and then let it work out how to do the rest for us.

In the meantime though, I can see that there there are still uses for databases of the kind you've shown us, but ideally sticking to just a few languages at a time: if you then feel the need to take many more and merge them together into a bigger database covering dozens of languages, then that would be easy enough, and a multilinguist might well want to have fifty of them in one package. But what you should really be asking for now is databases covering just one or two languages rather than expecting to get the whole lot in one massive download. I'd like to find some databases of that kind myself, or indeed any bilingual dictionary in electronic form which is free to pinch and rework into something else without having to pay anyone anything to do so, but an AGI system will be able to produce its own dictionaries simply by working with texts and translations of the same texts in other languages, quickly working out which words are equivalent and narrowing down their exact meanings with greater precision over time as it processes more documents. So, if you actually have the ability to produce intelligent software which is capable of reading and understanding natural language, you should just concentrate on one langauge and get it built, because everything else is just a distraction which will only serve to slow you down. Having said that, you will likely need to study the workings of many languages before you can get your mind around what goes on under their surface: you need to work out how to represent thoughts and translate between the same or related thoughts expressed in different forms, and the clues as to the best ways to do this are spread across many languages. Given that you already know Spanish and English, I would recommend that you look carefully at Esperanto, Japanese, Chinese to get some idea as to how language could be simplified (Esperanto only plays with the idea, but it gives you a few useful leads) and to see how radically different systems of grammar line up against each other. Your challenge then is to work out the grammar of thought.
Help the people of Laos by liking - https://www.facebook.com/TheSBInitiative/?ref=py_c

MSB-OS: http://www.magicschoolbook.com/computing/os-project - direct machine code programming
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Implementing non-English language in OS

Post by Brendan »

Hi,
Rusky wrote:...which is why the people who are actually working on AI in its original sense have started using phrases like "machine learning" or "data mining." There's tons of real stuff there besides incompetence or marketing buzzwords.
There's nothing unreal or incompetent about AI research (or machine learning research, or data mining research); in the same way there's nothing unreal or incompetent about researching how "magic" could actually be achieved in practice.

What is incompetent is suggesting that "AI" (or "magic") can be implemented and used in place of an algorithm. As soon as AI research leads to something that can be implemented in practice, it's no longer AI and is just a system of rules with no intelligence at all. In the same way, as soon as "research into magic" leads to something that can be used in practice, it ceases to be magic and becomes something that follows scientific rules.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Implementing non-English language in OS

Post by Brendan »

Hi,
DavidCooper wrote:I can't see the advantage of having all languages in one single database unless you're dealing with documents written in a mixture of languages where each word is chosen from a randomly selected language just for the hell of it. There are over 4000 languages out there, and AGI will soon be able to help people make up thousands more (for private conversations with machines), so do you really want to fill the machine with a gigabyte or more of unnecessary bloat which will likely take tens or hundreds of times longer to process for each word than a small language-specific database? In the real world, you typically only need one or two languages in the system at a time.
For translating text between different languages; I'd probably start by:
  • Identify the type of each word (verb, noun, etc), to construct a pattern of word types describing the sentence structure. This would require some repetition to deal with things like homophones - essentially, where a word's type is ambiguous without context, construct every possible pattern of word types, then use probabilities to determine which pattern is most likely (then go back and add the "decided word type hint" to each word to remove the ambiguity).
  • Encode the pattern of word types encoded as a number with a trivial "number = (number << MAX_TYPES) + next_type;" loop. Use that number and a hash table to find an entry that describes how to convert the sentence structure (e.g. word1 gets moved to position3, word3 gets moved to position2, etc).
  • Use the original words and their types from the source language to find equivalent words of the correct type for the destination language.
Of course this would take a huge amount of work to implement - with N languages you'd need N*(N-1) sets of hash tables for converting sentence structure; in addition to N dictionaries. I'd also expect many cases where it doesn't work and has to be assisted with some "special case" code.

Also note that for the purpose of internationalisation, translating between languages is completely unnecessary. It would be far better to use a "format string ID" used with a (per language) table of format strings. E.g. for "printInternationised(12, date, size)" if the language is US-English the format string might be "On {1:shortDate} the average was {2:simpleFloatingPointNumber} bytes.", and (based on locale settings for things like number and date formats) this might end up as "On 16/5/2016 the average was 1.234,56 bytes." or "On 5/16/2016 the average was 1,234.56 bytes." or something.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Implementing non-English language in OS

Post by Rusky »

Brendan wrote:As soon as AI research leads to something that can be implemented in practice, it's no longer AI and is just a system of rules with no intelligence at all.
That's exactly what I'm disagreeing with. Recent (and not-so-recent) advances in these fields are producing programs without clearly-defined
systems of rules that are implemented and used in place of algorithms.

Of course, I'm assuming that by "algorithm" you mean something like your gigantic translation hash table that a human could reasonably understand and implement, or "A step-by-step problem-solving procedure, especially an established, recursive computational procedure for solving a problem in a finite number of steps."

If instead you mean "literally anything a CPU can execute," then I'd ask you to define what you mean by "intelligence," and in particular what makes it impossible for a program to be intelligent (or conversely, what makes it impossible for intelligence to be seen as a program).
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Implementing non-English language in OS

Post by Brendan »

Hi,
Rusky wrote:
Brendan wrote:As soon as AI research leads to something that can be implemented in practice, it's no longer AI and is just a system of rules with no intelligence at all.
That's exactly what I'm disagreeing with. Recent (and not-so-recent) advances in these fields are producing programs without clearly-defined
systems of rules that are implemented and used in place of algorithms.
Bullshit. Show me a disassembly of these "advances that don't follow rules" and I'll show you a CPU manual containing detailed descriptions of the rules they follow.
Rusky wrote:Of course, I'm assuming that by "algorithm" you mean something like your gigantic translation hash table that a human could reasonably understand and implement, or "A step-by-step problem-solving procedure, especially an established, recursive computational procedure for solving a problem in a finite number of steps."

If instead you mean "literally anything a CPU can execute," then I'd ask you to define what you mean by "intelligence," and in particular what makes it impossible for a program to be intelligent (or conversely, what makes it impossible for intelligence to be seen as a program).
You're asking me to choose between 2 identical options. The only things CPUs can execute are "a step-by-step problem-solving procedures" (which includes step-by-step problem-solving procedures that generate step-by-step problem-solving procedures); and "literally anything a CPU can execute" is limited to things that can be described as "step-by-step problem-solving procedures".

Your argument seems to be that if a person can't reasonably understand it, then it must be magic. In that case the only thing needed for everything to be called "AI" is enough people that don't understand anything. Of course using hype to describe things makes it easier to find enough gullible fools willing to believe your step-by-step problem-solving procedures are "intelligent".


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: Implementing non-English language in OS

Post by ~ »

DavidCooper wrote:I can't see the advantage of having all languages in one single database unless you're dealing with documents written in a mixture of languages where each word is chosen from a randomly selected language just for the hell of it. There are over 4000 languages out there, and AGI will soon be able to help people make up thousands more (for private conversations with machines), so do you really want to fill the machine with a gigabyte or more of unnecessary bloat which will likely take tens or hundreds of times longer to process for each word than a small language-specific database? In the real world, you typically only need one or two languages in the system at a time.
As always English seems to be a good language to try to describe things to a technical and in a redundant, constant, non-ambiguous way.

It's a specific element that has its uses (determine which language a document is, bringing any document in any language to a default language and in turn to its virtual concepts to then classify and try to determine what the document is about -with several good guesses using several good methods-, searching documents for a word or term and finding them even if they instead use a synonym no matter which one, make automatic translations of GUIs using the most significative words -which would be determined from the database, and synonyms could be exchanged to try to make the translation more understandable, along with video help to show the efect of every menu option... that help should be possible to access from an icon in the same menu item, should be possible to click it to see what that option does-). And of course a dictionary that relates any word in any language with the same word in any other language. It's a great product, item, or system component in itself.
DavidCooper wrote:To identify languages, you can simply use a small file containing the most common hundred words of all known languages (unless you're dealing with a short line of text with no common words in it, but even then you can try loading in databases for the most likely languages first based on how common they are, the writing system used for the text or various other clues that will likely be there such as the source).
It would be the same thing, but dealing with a tiny subset. It would probably be poorer for technical documents (to be able to classify them in a specialized way, and knowing how to explain the specialized vocabulary in a practical and simple way).
DavidCooper wrote:Where you're right though is that you do want to convert all the words that mean the same thing to the same coding and for that coding to be used across all langauges, though it's hard to do it properly due to subtle differences and overlaps in meaning which mess things up, so you'll find you need to convert everything to pure concepts, and in many cases there are no exact words for those pure concepts in any spoken language. It's really a task that's best left for AGI to do: our job is to program AGI to handle just one language and then let it work out how to do the rest for us.

In the meantime though, I can see that there there are still uses for databases of the kind you've shown us, but ideally sticking to just a few languages at a time: if you then feel the need to take many more and merge them together into a bigger database covering dozens of languages, then that would be easy enough, and a multilinguist might well want to have fifty of them in one package. But what you should really be asking for now is databases covering just one or two languages rather than expecting to get the whole lot in one massive download. I'd like to find some databases of that kind myself, or indeed any bilingual dictionary in electronic form which is free to pinch and rework into something else without having to pay anyone anything to do so, but an AGI system will be able to produce its own dictionaries simply by working with texts and translations of the same texts in other languages, quickly working out which words are equivalent and narrowing down their exact meanings with greater precision over time as it processes more documents. So, if you actually have the ability to produce intelligent software which is capable of reading and understanding natural language, you should just concentrate on one langauge and get it built, because everything else is just a distraction which will only serve to slow you down. Having said that, you will likely need to study the workings of many languages before you can get your mind around what goes on under their surface: you need to work out how to represent thoughts and translate between the same or related thoughts expressed in different forms, and the clues as to the best ways to do this are spread across many languages. Given that you already know Spanish and English, I would recommend that you look carefully at Esperanto, Japanese, Chinese to get some idea as to how language could be simplified (Esperanto only plays with the idea, but it gives you a few useful leads) and to see how radically different systems of grammar line up against each other. Your challenge then is to work out the grammar of thought.
Google might have such a database and use it privately. I still see that a database that covers all human language is indeed an excellent automatic filter to simplify classification of words and concepts, so it could be used with good A.I. to improve considerably translations and interpretations.

There will be a lot to understand and learn when studying non-latin languages for relating them with the rest, including untranslatable terms, characters that combine and together form another one, etc... Would at least be a good way to learn languages more easily, so the concept is still very helpful and educative to the human level, where it's intended to help.

The database would need to contain an alphabet for every specific language ID ("ar", "ch", "de", "en", "eo", "es", "fr", "he", "ja", "ko".....), and a set of known special characters, numeric characters, etc.....


Then much later, a database or table of generic phrases that can be reshaped with some algorithm would need to be present once those can get understood and processed in code.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Implementing non-English language in OS

Post by Rusky »

Brendan wrote:Bullshit. Show me a disassembly of these "advances that don't follow rules" and I'll show you a CPU manual containing detailed descriptions of the rules they follow.
So your definition of "algorithm" is "literally anything a CPU can execute." That's not a very useful definition, but that's okay, we'll work with it.
Brendan wrote:Your argument seems to be that if a person can't reasonably understand it, then it must be magic.
Nope. You keep equating AI with magic, when my entire argument is that it's not.

Take, for example, the various types of neural networks Google has been using recently for image classification, voice recognition in noisy environments, more accurate natural language processing, playing Go, etc. You can relatively easily describe their structure, the process used to determine their weights, and the weights themselves, but that tells you nothing about their actual behavior, because it's got nothing to do with the problem domain.

You can still read the code and the data it uses; you can still study and quantify the program's output and decide whether it's adequate for your purposes; but that doesn't tell you anything about why those particular weights work. You couldn't meaningfully tweak the weights by hand to control its behavior, and you couldn't be sure that it wouldn't do something unexpected under the right (wrong? :P) conditions.

Note that I'm not trying to claim anything about the definition of intelligence, but rather that non-magical AI is an accurate label for actual, working programs, and that trying to treat them like a typical human-written program is not a very useful way to think about them.
Brendan wrote:Of course using hype to describe things makes it easier to find enough gullible fools willing to believe your step-by-step problem-solving procedures are "intelligent".
You keep implying that, because a CPU can do it, it can't be intelligent. So I ask you again: define "intelligence," what makes it impossible for a program to have it, and what makes it impossible for it to be viewed as a "step-by-step problem-solving procedure."

---

To stay marginally on topic, Google has been talking recently about generating a kind of vector-space database of cross-language concepts, and representing words as vectors in that space. For example, starting at the location for "king," subtracting the vector for "male," and adding the vector for "female" would lead you (near) to the location for "queen."

Building the database and defining the words would be done with machine learning, thus likely/hopefully finding relations between words that a brute-forced human-written database would miss. This would give the system a much more human-like ability to translate, rather than what Google Translate currently does, which is basically a sophisticated find-and-replace. It would also, however, lead to axes in the vector space without a clear meaning on their own, at least without reverse engineering what the training process came up with.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Implementing non-English language in OS

Post by Brendan »

Hi,
Rusky wrote:
Brendan wrote:Bullshit. Show me a disassembly of these "advances that don't follow rules" and I'll show you a CPU manual containing detailed descriptions of the rules they follow.
So your definition of "algorithm" is "literally anything a CPU can execute." That's not a very useful definition, but that's okay, we'll work with it.
Brendan wrote:Your argument seems to be that if a person can't reasonably understand it, then it must be magic.
Nope. You keep equating AI with magic, when my entire argument is that it's not.

Take, for example, the various types of neural networks Google has been using recently for image classification, voice recognition in noisy environments, more accurate natural language processing, playing Go, etc. You can relatively easily describe their structure, the process used to determine their weights, and the weights themselves, but that tells you nothing about their actual behavior, because it's got nothing to do with the problem domain.
A neural network is essentially (equivalent to) a huge function with a large number of parameters, where a (potentially incomplete/partial) brute force search is used to find most of the parameters. The average piece of cheese contains more intelligence than this.
Rusky wrote:You can still read the code and the data it uses; you can still study and quantify the program's output and decide whether it's adequate for your purposes; but that doesn't tell you anything about why those particular weights work. You couldn't meaningfully tweak the weights by hand to control its behavior, and you couldn't be sure that it wouldn't do something unexpected under the right (wrong? :P) conditions.
And this is important for some reason?

What if I took a random number generator and made it generate text files containing random characters, and then fed those files (one at a time) into GCC until something not only compiles without error, but also passes a unit test. Would this hideous kludge of pure stupidity be "intelligent" to you, simply because it's difficult for you to understand the randomly generated source file?
Rusky wrote:Note that I'm not trying to claim anything about the definition of intelligence, but rather that non-magical AI is an accurate label for actual, working programs, and that trying to treat them like a typical human-written program is not a very useful way to think about them.
While unlikely to be intentional; you are saying something about your definition of intelligence. You're saying your definition includes "extreme unintelligence" as long as you can't understand it easily.
Rusky wrote:
Brendan wrote:Of course using hype to describe things makes it easier to find enough gullible fools willing to believe your step-by-step problem-solving procedures are "intelligent".
You keep implying that, because a CPU can do it, it can't be intelligent. So I ask you again: define "intelligence," what makes it impossible for a program to have it, and what makes it impossible for it to be viewed as a "step-by-step problem-solving procedure."
Fine. I define "intelligence" as something that is able to decide for itself what it wants to do; and therefore doesn't do what we want it to do; and is therefore useless to us.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Implementing non-English language in OS

Post by Rusky »

Brendan wrote:A neural network is essentially (equivalent to) a huge function with a large number of parameters, where a (potentially incomplete/partial) brute force search is used to find most of the parameters. The average piece of cheese contains more intelligence than this.

Would this hideous kludge of pure stupidity be "intelligent" to you, simply because it's difficult for you to understand the randomly generated source file?

While unlikely to be intentional; you are saying something about your definition of intelligence. You're saying your definition includes "extreme unintelligence" as long as you can't understand it easily.
That is correct- machine learning research is basically about how to improve that not-quite-brute-force search. The point is, there's not much difference between that and the "natural" intelligence produced by evolution. Genetic algorithms are even one search method that directly mimics evolution, and brains are basically huge functions with large numbers of parameters- so are brains also a "hideous kludge of pure stupidity" and "extreme unintelligence"?

You're confusing the process of creating the program (or brain), which is indeed quite dumb, and the program (or brain) itself, which is an encoding of useful information- whether it's useful to the Google engineers overseeing the process or to the DNA propagated by the resulting organism. It just happens to be quite hard to manipulate directly.
Brendan wrote:Fine. I define "intelligence" as something that is able to decide for itself what it wants to do; and therefore doesn't do what we want it to do; and is therefore useless to us.
Just like your definition of "algorithm," that is neither what the term means in computer science, nor a very useful definition, but again we can work with it. Please define what it means for an entity to want something, why a biological brain can do it, and why a program can't.

However, the term "artificial intelligence" does not, and was never meant to, denote something with different desires than its creators or users. The founders of the field define AI as "a system that perceives its environment and takes actions that maximize its chances of success," sometimes including knowledge and learning in that definition.

Regardless of how you want to use the word "intelligence," my point is that, beyond your magic definition, AI does in practice refer to a useful class of programs that we ought to have a name for, and that refusing to read the term the way it's meant is rather... unintelligent.
embryo2
Member
Member
Posts: 397
Joined: Wed Jun 03, 2015 5:03 am

Re: Implementing non-English language in OS

Post by embryo2 »

Brendan wrote:I define "intelligence" as something that is able to decide for itself what it wants to do; and therefore doesn't do what we want it to do; and is therefore useless to us.
But what if it wants to implement our wishes and fulfill our needs? It wants to do it and it means it's still intelligent by your definition, but also it is very useful for us.

Add a "wish enforcer" to a neural network and you'll get "intelligence".

There's no intelligence in the world, just stupid enumeration using brute force. Including search for a wish. Do you really know what do you want?
My previous account (embryo) was accidentally deleted, so I have no chance but to use something new. But may be it was a good lesson about software reliability :)
Post Reply