|
|
How Search Engines Operate - answering your questions | ||
Discussion by GuardianGA with 6 Replies.
Last Update: June 16, 2008, 11:17 am | |||
![]() |
|
|
You can put a file called spider.txt on the root of your site to prevent spiders from going to your site or limit their actions.
I am very much thrilled knowing about the spiders. It has exited a great deal and I want to know more about the spiders you talk about. How these spiders are built? Which languages are used? I have come to know that it is related to the Meta tag of HTML. It is true? Please answer my queries. I am waiting for your answer eagerly. Bye
There are 5 parts that make up a search engine.
Spider - This program downloads web pages just like a web browser. The spider does not take images (or like media) in to account. It only downloads the html version.
Crawler - This program finds all links on each page. The crawler follows these links and tries to find documents not already known to the search engine.
Indexer - This component parses each page and analyzes the various elements, such as text, headers, structural or stylistic features, special HTML tags, etc.
Database - This is the storage area for the data that the search engine downloads and analyzes. Sometimes it is called the index of the search engine.
Results Engine - The results engine ranks pages. It determines which pages best match a user's query and in what order the pages should be listed.
Web server - The search engine web server usually contains a HTML page with an input field where the user can specify the search query he or she is interested in. The web server is also responsible for displaying search results to the user in the form of an HTML page.
That is how a search engine works. So what matters is the html page rather than what we see in the browser in terms of SEO. A great book is SEO MINDSET. Find it and read it.
QUOTE (LegallyHigh)
I wonder how large the Google Database is by this point? I'm guessing its deffinately in Terabytes by now, since it stores Caches of pages, along with images, videos , and every other form of Content. And also, what kind of Languages are various search engines programmed in.Link: view Post: 124578
From Wikipedia:
QUOTE
Servers are commodity-class x86 PCs running customized versions of Linux. Indeed, the goal is to purchase CPU generations that offer the best performance per dollar, not absolute performance. Estimates of the power required for over 450,000 servers range upwards of 20 megawatts, which could cost on the order of US$2 million per month in electricity charges.Specifications:
* Upwards of 15,000 servers[4]ranging from a 533 MHz Intel Celeron to a dual 1.4 GHz Intel Pentium III (as of 2003); a 2005 by Paul Strassmann has 200,000 servers,[6] while unspecified sources claimed this number to be upwards of 450,000 in 2006.[1]
* One or more 80GB hard disks per server (2003)
* 2–4 GB of memory per machine (2004)
The exact size and whereabouts of the data centers Google uses are unknown, and official figures remain intentionally vague. In a 2000 estimate, Google's server farm consisted of 6000 processors, 12,000 common IDE disks (2 per machine, and one processor per machine), at four sites: two in Silicon Valley, California and two in Virginia.[7] Each site had an OC-48 (2488 Mbit/s) internet connection and an OC-12 (622 Mbit/s) connection to other Google sites. The connections are eventually routed down to 4 x 1 Gbit/s lines connecting up to 64 racks, each rack holding 80 machines and two ethernet switches. The servers run custom server software called Google Web Server.
15,000 * 80GB = 1.2 PB (Petabytes) as a minimum amount of hard drive capacity. Sure, a fair chunk of it probably isn't used, just as yet more is used for the OS and other such necessary software, but damn me if Google would buy too many more servers than they needed to. Still, my estimate is hardly particularly thought out or detailed. I took the liberty of doing a little more research in to some people who have made more accurate estimates.
In short: the Google machine is big, scary, and makes most supercomputers cry themselves to sleep.
Similar Topics:
Search Engine Optimization
Search Engines
Some Html Ways To Increase Your Ran...
Search Engines A new kind of search engine (1)
|
(8) The Brainboost Search Engine The smartest search engine ever!
|
HOME 






