|
|
|
| Web Hosting Guide |
![]() ![]() |
Downloading The Internet |
Jan 18 2010, 01:34 AM
Post
#1
|
|
|
Super Member Group: [HOSTED] Posts: 898 Joined: 12-July 06 From: Ontario, Canada Member No.: 14,464 myCENTs:7.83 |
I'm wondering if it is possible to save a copy of everything on the Internet. Ignoring ISP data transfer limitations (max GB per month), I have a download speed of approximately 4 Mbps.
The Internet isn't limited to web pages though, it includes everything that is public accessible (not password-protected) which includes all music, videos, pictures, software, etc. Furthermore, I am not limiting it to HTTP servers as torrents, files on FTP servers and anything on peer-to-peer networks (Gnutella/LimeWire) will count as well. Saving everything at its current state (ignoring changes to the live version after it is saved), how long will this take? What if I upgrade my Internet connection, or theoretically use all the bandwidth of (for example) educational institutions (universities), ISPs (Shaw, Comcast, etc) and large corporations (Microsoft, Google, etc). I am not talking about indexing content, I mean saving the actual file. Every web page would be considered one file, and pictures, JavaScript, CSS, etc would be their own files. |
|
|
|
Jan 18 2010, 01:30 AM
Post
#2
|
|
|
Premium Member Group: [HOSTED] Posts: 226 Joined: 1-October 07 From: United States Member No.: 25,237 myCENTs:63.28 |
I don't know, but some institutions save on bandwidth costs by implementing a cache server of cache box. This device/host, stores copies of internet request, url, dns cache, images, code, etc. for duration till it's purged.
The cache is used if people, staff, or computer labs frequent certain web sites all the time. this makes web sites (pages) appear to load faster than their Internet connection and saves bandwidth. Companies that do and create cache solutions would have a better chance along with major backbone ISPs, of answering or at least making an educated guess at the question you are trying to ask. Levimage P.S. I myself have no clue, 10 GB/ps Ethernet is coming out soon. Probably good new for dvd torrent hippies |
|
|
|
Jan 18 2010, 02:10 PM
Post
#3
|
|
|
Way Out Of Control - You need a life :) Group: [MODERATOR] Posts: 3,334 Joined: 16-August 05 Member No.: 7,896 myCENTs:80.50 |
I also think that would be a big problem : almost each computer is on the internet, so, in order to backup everything reachable through internet, you should have available in your computer the whole amount of disks of all the computers around the wolrd. As you mention it, Google computers disk space would be peanuts because you don't want to only index, you want the whole contents.
|
|
|
|
Jan 18 2010, 03:48 PM
Post
#4
|
|
|
Member - Active Contributor Group: Members Posts: 89 Joined: 17-January 10 From: Macedonia Member No.: 45,766 myCENTs:7.84 |
First of all you can't find enough free space to do that or internet to download it, second why would you need a backup of the internet?
|
|
|
|
Jan 19 2010, 05:48 AM
Post
#5
|
|
|
Super Member Group: [HOSTED] Posts: 898 Joined: 12-July 06 From: Ontario, Canada Member No.: 14,464 myCENTs:7.83 |
The disk space available to me is in truly infinite amounts, the only question here is process and bandwidth, as well as CPU and disk speed to save and access everything.
A "backup" of the Internet is for experimental purposes only, and believe me, even if I manage to backup the entire Internet, including private network content, I would be using almost no disk space (an analogy would be say .... 1 electron out of the entire universe) |
|
|
|
Jan 19 2010, 04:39 PM
Post
#6
|
|
|
the Q Group: [HOSTED] Posts: 1,432 Joined: 13-July 05 From: Lithuania, Vilnius Member No.: 7,059 myCENTs:52.08 |
Well, there are a lot of data centers with more than millions of terabytes of data, which take a lot of energy, so I doubt you could compete with them
For example, a lot of content is dynamic, so it would be really hard to find the difference between different files and you could end just by an infinite loop unless you would find differences and ignore some of content in the Internet. Moreover, as I know google doesn't offer google cache version anymore? or does it? maybe because also due to resources? http://www.archive.org/ - offers a way back machine, quite cool, but also it's usually slow, it doesn't offer all the content, I mean it doesn't cache everything and it rarely caches images, well, but it's a non profit project |
|
|
|
Jan 20 2010, 07:55 PM
Post
#7
|
|
|
Super Member Group: [HOSTED] Posts: 711 Joined: 25-April 05 Member No.: 4,374 myCENTs:0.44 |
Interesting question. I am actually surprised that that you, FireFoxRules, asked it as it sounds like a crazy idea that I would expect from a newb. At any rate it did get me to think so I will propose an answer.
Assumptions • You have an insane Internet backbone connection will guaranteed reliability and speed. I will assume that you have a 100 Mb/sec connection which is usually only available to ISP level organizations. Gottchas • Connection speeds are measured in BITS and not BYTES. There are 8 bits to a byte so this means that you need to divide your connection speed by 8 right off the top. This will make our 100 Mbit/sec connection a 12.5 Mbyte/sec connection. With typical network delays, this would become 6.25 Mbyte/sec. Now let’s do some calculations (whips out trusty TI-89 calculator). 12.5 Mbyte/sec*60 seconds = 750 MB/min 750 MB/min* 60 mins = 45 GB/hour 45 GB/hour *24 hours = 1080 GB/day or ~1 TB/day (1.08e12) With the YouTube example above of 7.7 petabytes (10e15)… 7.7e15 Bytes/1.08e12 Bytes/day=7129.63 days 7129.63 days/365 days/year = 19.5332 years Just downloading the YouTube database with an insane Internet connection will take you almost 20 years and almost 1 million dollars just in hard drive storage. Hope this answers your question |
|
|
|
Jan 21 2010, 10:35 PM
Post
#8
|
|
|
Way Out Of Control - You need a life :) Group: [HOSTED] Posts: 1,338 Joined: 2-August 05 From: Kapellen (Antwerp, Belgium) Member No.: 7,585 myCENTs:34.72 |
With your 4Mbps download speed you'll never be able to keep up with all the data that is put on the internet daily, especialy on sites like Youtube (do note, Youtube uses 7.7PB for storing all the data, but of every video they keep the original plus a 360p, 480p, 720p and 1080p version where possible, the most efficient way would be to either store the highest resolution video or the original video).
The next problem is power consumption. A single disk doesn't use a lot of power, especialy compared to a modern cpu. But 1000 disks easily use a few KWatts, generating tons of energy which you have to cool down. You'll also need a huge room for all the racks and for extra free air for cooling purposes (a backup). Imho, it's impossible to do. |
|
|
|
Jan 22 2010, 03:47 AM
Post
#9
|
|
|
Premium Member Group: [HOSTED] Posts: 226 Joined: 1-October 07 From: United States Member No.: 25,237 myCENTs:63.28 |
5 years from now it would not be impossible but then again there will probably be 25 times the data out there. Interesting huh?
|
|
|
|
Jan 22 2010, 06:16 AM
Post
#10
|
|
|
Premium Member Group: Members Posts: 363 Joined: 21-September 09 From: Land of Shadows Member No.: 42,995 myCENTs:87.17 |
I came across this blog post while surfing, i'm not sure if this is the one company that is upto the download and archiving the internet. Removing the video and audio content from the web, text based sites are easy to crawl and archive i guess if these companies are doing it.
Check this blog post again. Interesting read, not sure that site is still offering such service or not. But i guess archiving wikipedia is possible to some extent then it is not bad idea i think that is also enough for some people. |
|
|
|
![]() ![]() |
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
Similar Topics
| Topic Title | Replies | Topic Starter | Views | Last Action | |||
|---|---|---|---|---|---|---|---|
![]() |
10 | Eggie | 691 | 19th August 2010 - 09:41 PM Last post by: levimage |
|||
![]() |
16 | H.O.D. | 557 | 11th August 2010 - 05:36 PM Last post by: H.O.D. |
|||
![]() |
6 | FirefoxRocks | 290 | 27th July 2010 - 05:26 PM Last post by: yordan |
|||
![]() |
99 | Cookiemonster | 18,213 | 15th July 2010 - 11:32 PM Last post by: iG-Murtaza |
|||
![]() |
16 | soleimanian | 4,142 | 14th July 2010 - 02:10 AM Last post by: iG-Dan Allen |
|||
![]() |
5 | brishisharma | 5,996 | 19th June 2010 - 09:51 AM Last post by: iG-Blue Blade |
|||
![]() |
1 | Ahsaniqbal111 | 132 | 18th June 2010 - 05:38 PM Last post by: Боја |
|||
![]() |
9 | viewertom | 1,611 | 16th May 2010 - 05:21 PM Last post by: iG-patricia |
|||
![]() |
14 | WeaponX | 5,766 | 8th April 2010 - 03:46 PM Last post by: iG-Adex |
|||
![]() |
14 | sajjadnaveed | 4,335 | 6th April 2010 - 04:19 PM Last post by: iG-Murtuza |
|||
![]() |
26 | ejfetters | 10,090 | 24th February 2010 - 06:49 AM Last post by: iGuest |
|||
![]() |
29 | tamer3kz | 5,421 | 24th February 2010 - 05:41 AM Last post by: iG- |
|||
![]() |
17 | Ajay Shivaa | 3,098 | 19th February 2010 - 09:45 AM Last post by: dangerdan |
|||
![]() |
10 | nightfox | 3,133 | 9th February 2010 - 05:55 PM Last post by: iG-abcd |
|||
![]() |
0 | 8ennett | 405 | 29th January 2010 - 03:57 PM Last post by: 8ennett |
|||
|
Lo-Fi Version | Time is now: 3rd September 2010 - 09:04 AM |
© 2010 AstaHost: Free Web Hosting & Technical Discussion, Free Web Hosting. a member of xisto.
Powered by Invision Board. Skin: IPB Forum Skins
Expand / Collapse Navigation



Jan 18 2010, 01:34 AM






