How does webarchive get content

PeterX · Post by **PeterX** » Mon Feb 08, 2021 11:03 am

How/where does the web-archive page get old and lost web page contents?

The page(s) must obviously be stored by someone _before_ they disappear.

I mean, does some person or software store randomly webpages and checks if they disappear? Or does some person save webpages on his own behalf and later contributes them to the archive?

And how do they know in advance which pages will disappear?

Greetings
Peter

nullplan · Post by **nullplan** » Mon Feb 08, 2021 11:43 am

The web archive employs a program to download the pages and builtin assets (I believe those are called "spiders"). And then they have a ton of storage somewhere to store this stuff on. Obviously they don't know which sites are going to disappear. They just sample some pages by some algorithm. If they managed to sample a page you want before it got memory holed, then you are in luck. You can also somehow request a certain page be added.

xenos · Post by **xenos** » Mon Feb 08, 2021 2:27 pm

One can also manually save pages with web.archive.org/save/URL.

OSDev.org

How does webarchive get content

How does webarchive get content

Re: How does webarchive get content

Re: How does webarchive get content