Page 1 of 1

The Most Common Subject Words In This Forum

Posted: Fri Apr 13, 2007 9:06 pm
by Kevin McGuire
The Most Common Subject Words In This Forum

The entire development forum is used to extract over 8900 thread titles. Each titles has the words broken out by spaces. Each word only allows alphanumerical characters and all uppercase letters are converted into lower case. While this happens each word is counted from zero to one. So each of the counts to the right of the word are really +1.

1. to * 718
2. in * 649
3. os * 616
4. and * 614
5. a * 562
6. kernel * 491
7. the * 449
8. with * 442
9. problem * 398
10. help * 391
11. c * 367
12. how * 364
13. memory * 318
14. mode * 301
15. i * 281
16. for * 267
17. question * 236
18. of * 235
19. on * 213
20. bochs * 195
21. my * 191
22. floppy * 188
23. paging * 184
24. driver * 181
25. pmode * 177
26. about * 176
27. is * 175
28. code * 168
29. system * 166
30. grub * 164
31. from * 160
32. what * 159
33. problems * 152
34. an * 141
35. file * 141
36. keyboard * 133
37. not * 130
38. need * 127
39. gcc * 124
40. new * 120
41. do * 118
42. interrupt * 116
43. can * 116
44. questions * 115
45. idt * 113
46. boot * 113
47. error * 112
48. stack * 107
49. multitasking * 107

Posted: Fri Apr 13, 2007 10:07 pm
by AndrewAPrice
Cool!

Posted: Mon Apr 16, 2007 3:34 pm
by nick8325
I like that "kernel" is more common than "the" :)

Posted: Mon Apr 16, 2007 3:42 pm
by Alboin
nick8325 wrote:I like that "kernel" is more common than "the" :)
Well, at least we know we have our linguistic priorities straight.

Posted: Mon Apr 16, 2007 4:26 pm
by Kevin McGuire
You guys have any ideas what we could do with extracting data from the forums? I got board and did it, but I figure there could be a useful idea in it somewhere..

Posted: Mon Apr 16, 2007 5:08 pm
by chase
Filter with a list of the most common english words and get a list of the most frequent of OS development subjects. Could be used to figure out where wiki articles should be expanded or created.

forumdown

Posted: Tue Apr 17, 2007 8:00 pm
by Kevin McGuire
I will give it a try. It actually seems a little more complicated then what you would think with the initial thought, but I have confidence that it is possible.

I got a initial tool written. A program forumdown which will download a entire sub forum and store the linked list structures of threads and posts into a local data file that can be loaded.

I did a little thinking. I came up with the conclusion that I can use a website that provides a dictionary, thesaurus, and encyclopedia to allow some degree of spell checking and mapping of similar words such as IDT and Interrupt Descriptor Table and allow some sort of primitive comprehension of sentences to get an idea of exactly what people are talking about in the posts.

I will try to use this site to provide the English word database, and add some cache to prevent it from taking a excess amount of time.
http://www.reference.com/browse/

http://kmcguire.jouleos.galekus.com/dok ... orum_tools

Lets see if I can get the other part working.

Posted: Sat Apr 21, 2007 10:25 pm
by Kevin McGuire
A sprocket, two gears, and some strange gooey gel came out my head. I think I was thinking too hard. This might be more than I asked for. I got to get this kernel finished. :P