show related

Questions, comments, and suggestions about this site should go here.
Post Reply
User avatar
dc0d32
Member
Member
Posts: 69
Joined: Thu Jun 09, 2005 11:00 pm
Location: Right here

show related

Post by dc0d32 »

Hey,

I remember stumbling upon the MT forum some 6 odd years ago, when the number of topics (in the general sense and not the thread title sense) that people were concerned about were relatively smaller, and hence one could easily have a look at almost all the important threads discussing about their specific topic of interest (and hence the "please search before you ask" sticky). I believe that today the forum has much more width as far as (sub)topics go. Besides, the idea of wiki, as I understand it, is to consolidate the knowledge from the posts into an easy-to-find/read/pointTo resource, but I believe there is good portion still missing from the wiki. Then again, it is quite a challenge to define an exhaustive hierarchy and arrange threads/posts/wikis in it. (please correct me if I am wrong - it has been eons before I last logged on to contribute something)

Thus, at least for the newbies, it would help if we can have a little widget on the page somewhere that lists other relevant threads, wiki pages and maybe even relevant project pages outside the domain. It is a lot easier to research about a topic, learning from the threads, if one can quickly go through nearly all the important related ones.

The concept is not new - we see this on many content aggregators. The simplest way to do this is to do it offline by the well established topic model clustering techniques. It could work beautifully considering our topic domain is limited and we can tune the clusters. Also, this is an offline process (maybe done once every week/month), which means we can take the site dump (preferably in the db format) which contains information (suitably linked) about the threads, posts in them with poster and time, list of users, and the wiki dump. We can then generate the top X related threads for each thread, top Y wiki links, and maybe even Top Z related posts for each post. Of course we'd have to take special care of the outliers (rants, bashings and the occasional spam). Given that we are mostly osdevs here, the rants might have some structure as well (I do not know if they do).

I understand that the notion of relevance for a user is query dependent, which would make things a lot costlier at runtime (if frequent users pose a query to find threads, that is).

@owners (chase?): Would it be possible to share a snapshot, dump of some sort so that I can try this out in my free time?

There is a wealth of useful info in our board - let us make it more accessible.

What say?
_
prashant
Post Reply