Page 1 of 2

Reading the entire internet ?

Posted: Sat Jan 14, 2012 11:52 pm
by Sam111
For fun I was trying to figure out how long approximately it would take to read the whole entire internet.

My equation is

let
w = number of words on an average webpage
p = number of webpage on the internet approx.
m = how many minutes you read a day
s = the speed at which you read (on average it is said the average person reads between 200-250 words a minute)

t = the amount of time it would take a person (provided they have eternal life and the internet is not going to change i.e the assumption that a person could live for ever and stop time to read what the internet is currently at) "IN DAYS"

Obviously the math equation is
t = [ ( p * w ) / (s * m) ]

I can look up and find average values for how many webpages are on the entire internet approx.
I can vary the amount of hours that a human could read in minutes based on understanding / burn out / sleep needs
I can vary the speed but the average of 200 - 300 words per minute sounds reasonable for an approx.

But My problem is I haven't found any good sources on how many words are on a typical webpage or what value I should use for w.

Anybody have a good idea

Note not only can this be used to tell you how many days a dedicated reader can read the internet but you could also use a similar strategy to determine how long a book, or other reading material is going to take to read. Note: of course this doesn't imply you comprehend it all and that would be based on different factors like interest of subject material ,...etc

I am curious to here your thoughts on a w value...
Maybe the w variable is not the way to go and another variables I can use in its place / another formula I can derive that would be a better approximation to the real thing :)

Re: Reading the entire internet ?

Posted: Sun Jan 15, 2012 1:58 am
by xenos
A colleague once asked me to download the whole internet for him and to burn it on a CD. I told him I'll do it "as soon as I have enough time". A few years later I found the book "The whole internet" from 1994 in a pile of old books that were removed from a library, so I gave it to him.

Re: Reading the entire internet ?

Posted: Sun Jan 15, 2012 11:43 am
by Sam111
You might also add some factors into the equation. E.g. if you want to calculate the non-raw reading speed but include navigation on the website, you'd need to include a factor that specifies extra time it takes to switch from page to page.
good point, I can easily come up with a more general formula that incorporates average traversal time.


XenOS maybe the whole internet could fit on a cd 700MB in 1994ish but today not even close.

So I was kind of wondering if somebody could come up or find a reasonable w value for my question so I can approximate things.
Quote of the day
Not in it for the metals , not for respect , not for honor, not even to make a profit, am in it for the knowledge... For that is the most important thing to me and I will try to obtain it by any means possible with out break the laws I agree with.

by blackbox / motherbrain

Re: Reading the entire internet ?

Posted: Sun Jan 15, 2012 11:48 am
by Brendan
Hi,
berkus wrote:You might also add some factors into the equation. E.g. if you want to calculate the non-raw reading speed but include navigation on the website, you'd need to include a factor that specifies extra time it takes to switch from page to page.
I'd also suggest that the number of pages on the Internet is not a constant. There might be 100 million pages now, but by the time you finish reading them all another 50 million pages might have been created.

This also means that you can't read pages in an order (e.g. starting from "http://a.a/a.html" or something) from start to finish. You'd need to do something like determine a suitable order, then read what you can from start to finish while generating MD5 checksum of each page; then when you reach the end start again from the start while skipping pages that have a correct MD5 checksum. You'd have to keep doing that until you have a correct MD5 checksum for all pages. In this case, when you've read most of the Internet it may take a long time for your software to find an unread page, so you'd have to factor in time spend finding unread pages.

The next problem is going to be defining what a page is. Are pages HTML only? What about error pages? Does it include XHTML? How about plain text files? Is a PDF file a page on the internet that happens to require a browser plugin to read?

Then there's dynamic content. For example, for these forums is "http://forum.osdev.org/viewtopic.php" one page, or is it lots of different pages (with different data for the "&f=11&p=202780" part)? If it is lots of different pages, then is it an "almost infinite" number of pages? For example, is a page with "&hilit=foo" a different page to one with "&hilit=bar" (and how many variations are there from "&hilit=a" all the way to "&hilit=zzzzzzzzzzzzzzzzz")?


Cheers,

Brendan

Re: Reading the entire internet ?

Posted: Sun Jan 15, 2012 12:16 pm
by Sam111
well
If you read what I wrote above I said provided the person has eternal life and the ability to stop time (i.e freeze the internet from changing)

I agree if it was changing you could never finish because the rate at which the internet is growing is septupling now and even get faster ... going by charts of rates of growth of the internet.

BUT ASSUMING YOU COULD STOP TIME AND ASSUMING YOU HAVE ETERNAL LIFE!

Also And good point on word , pdf , and other downloadable... (This would be another factor)
BUT FOR NOW LETS SAY ONLY PLAIN TEXT ON WEBPAGE!

Does anybody have any ideas of what a good w value might be for this equation?
That is how many words are on a typical average webpage.

And also curious what website/webpage has the most words on their webpages. (i.e what is the most amount of words currently on a webpage)

Re: Reading the entire internet ?

Posted: Sun Jan 15, 2012 1:48 pm
by DavidCooper
I heard just the other day that someone had worked out it would take a hundred years to read the whole of Wikipedia. That's obviously ignoring anything added to it after the time you start reading it, and it doesn't allow you any extra time to understand what you're reading whenever it gets technical.

Re: Reading the entire internet ?

Posted: Mon Jan 16, 2012 12:23 am
by Sam111
First off I don't fully believe he got his calculations right.
Second all who said anything about reading all of wikipedia
Third I believe reading all of wikipedia is equal to reading all the wiki pages / categories that are worth anything (their are alot of categories and pages that are just common knowledge or knowledge you could easy look up like a phone number ,..etc) The important stuff is the concepts , terminology , methods/thought process ,...etc
Plus you would only have to read your language "Mine English" which totals close to 3million pages
But of those 3million only probably a few hundred thousand between 100-500 thousand would be worth anything at least knowledge wise....

if you read at 250 to 300 word per minute then knowing the average number of words on a page would give you and approx. I am betting it is a couple of years if you are doing nothing but eating . sleeping , ...etc and reading.
Even with the rate of wiki pages increasing only a small portion are worth while reads by my criteria.
And the rate at which this worth while portion grows is significantly less then my reading a day if I was to do that :)

Anyway my main question is average number of words for an internet page but also curious now for just restricting it to a wikipedia page.
Difficult but not impossible

Re: Reading the entire internet ?

Posted: Mon Jan 16, 2012 8:16 am
by Coty
I just added more text to the internet :lol:

Good luck getting every page from 4chan, those boards can go through 1000 words a seconds (this is why there is only 15 pages on each boards before auto delete.) I'd almost bet your bot would get stuck there :D

Re: Reading the entire internet ?

Posted: Mon Jan 16, 2012 1:06 pm
by Sam111
that's great

But my main question still remains
what is a good approx. to the average number of words on a page on the internet.
And what is a good approx. to the average number of words on a wikipedia page in english.

Re: Reading the entire internet ?

Posted: Mon Jan 16, 2012 2:06 pm
by gerryg400
that's great

But my main question still remains
what is a good approx. to the average number of words on a page on the internet.
Why are you asking here ? If you really want to know then google it. I'm sure there are many reliable studies.
And what is a good approx. to the average number of words on a wikipedia page in english.
Why do you want an approximation ? Just get the exact figures from Wikipedia, I'm sure they are available.

Re: Reading the entire internet ?

Posted: Mon Jan 16, 2012 9:47 pm
by Sam111
well for wikipedia and internet I agree I can easily get the information on how many pages approx. for internet and wikpedia pages for the different languages.
BUT their is no number I can find that gives me an average number of words on a wikipedia page or internet page ONLY THE APPROX. PAGE NUMBERS ... NOT THE AVERAGE AMOUNT OF WORDS ON A PAGE.

Re: Reading the entire internet ?

Posted: Mon Jan 16, 2012 10:31 pm
by gerryg400
If you are certain that the figures don't exist, give up now.

Re: Reading the entire internet ?

Posted: Tue Jan 17, 2012 2:25 am
by Sam111
Ok , then how the heck did somebody calculate it will take approx. 100 years to read all of wikipedia?

I would believe he would need to know how many words in total for all the wikipedia entries or an average words per wiki page??? Curious, Probably just blown smoke up my @$$.

And for the internet you can get an approx to the amount of pages or websites even a rate but not an average or word count... at least I couldn't find a valid one to use.

But please I am all ears to anybody could find these values or equivalent process I could use in place.

My last resort would be writting a program to do it and then wget the whole damn thing but this would take to long and really don't want to go down that route.

Re: Reading the entire internet ?

Posted: Tue Jan 17, 2012 2:36 am
by Sam111
never mind , on wikipedia page I found the average number to be around 590 word per article
with about 3.6 million articles

The math is now simple but would still like to figure out of the 3.6million articles how many are scientific / technical or what people would term challenging articles to comprehend.

From that subcategory I wish their was away to get an average words per article restricted to that....
Anybody on this.

Well anyway I can do the simple math to calculate how long it would take now.... based on all the articles reading / average words ,...etc

:P

Curious if the internet has an equivalent average word number (something equivalent to the Avogadro's number for the universe of the internet of words if you want to make the analogy )

Re: Reading the entire internet ?

Posted: Tue Jan 17, 2012 10:24 am
by Sam111
do you mind elaborating on how this was done?

Going by wikipedia statistics

Encyclopedias by size
Encyclopedia Edition Articles
(thousands) Words
(millions) Est. characters
(millions) Average words per article
Wikipedia English 3,590+ 2,100+ 13,900+ 590

My calculations comes out approx. 16 years with no sleep reading straight to complete all the wikipedia entries.
Factoring people sleep 136 days out of the year if they are sleeping 9 hours a night
Factoring in that in you have to add 6 more years around...
So the final would be in 22 years ( went with 9 hours since if you sleep only 7 it gives you 2 hours to buy groceries/eat ,...etc if you need more time then factor that)

So conceivably between 22 years to 30years you can read it all where they heck was the 100 years?
Assuming of course we are not factoring the articles that are created in the process....


But even 30 years you could probably read all the useful information in say 5 or 10 years skipping stuff. Even a few year would be impressive and probably get you to the top of every subject level if you where into that.

Anyway as for the internet not even going to try it to many dynamically generated things which throw off calculations...... and to rapidly growing....

THAT IS OFCOURSE THE GOVERMENT STOPS THIS FUCKING SOPA / PIPA ****.
WHAT A BUNCH OF FUCKING IDIOTS.