Welcome Guest ( Log In | Register )



 
Reply to this topicStart new topic
> How Search Engines Operate, answering your questions
GuardianGA
post Sep 18 2005, 01:01 AM
Post #1


Newbie [ Level 2 ]
Group Icon

Group: Banned
Posts: 13
Joined: 17-September 05
Member No.: 8,531



When I took a computer class, they said that search engines use "spiders" to register your site and have it show up when you type a keyword. I'm pretty sure this is true, coming from a professor with a degree in Computer Science but I'm wondering, what exactly are what they call spiders? What are spiders hosted on and all the technical questions. If you know anything technically about specifically how search engines like Google functions, this is the thread for you. So, please - share what you know. Your honest patronage is greatly appreciated and will be listened to.
Go to the top of the page
 
+Quote Post
szupie
post Sep 18 2005, 02:41 AM
Post #2


S.P.A.M.S.W.A.T.
Group Icon

Group: Members
Posts: 814
Joined: 22-January 05
From: San Antonio, Texas (No, I'm not dumb. I just moved here...)
Member No.: 2,284



Spiders are hosted on the search engine. Search engines go to a website, checks out all the links it contains, and indexes all the valid links. It's like looking through a family tree and gathering information about each of the children. If there is no website linking to you, the spiders will not find you, because they do not know of your existence.

You can put a file called spider.txt on the root of your site to prevent spiders from going to your site or limit their actions.
Go to the top of the page
 
+Quote Post
vicky99
post Jul 24 2006, 09:24 AM
Post #3


Member [ Level 2 ]
Group Icon

Group: Members
Posts: 54
Joined: 28-May 06
Member No.: 13,691



Dear szupie
I am very much thrilled knowing about the spiders. It has exited a great deal and I want to know more about the spiders you talk about. How these spiders are built? Which languages are used? I have come to know that it is related to the Meta tag of HTML. It is true? Please answer my queries. I am waiting for your answer eagerly. Bye
Go to the top of the page
 
+Quote Post
lonebyrd
post Jul 25 2006, 11:34 AM
Post #4


Premium Member
Group Icon

Group: Members
Posts: 302
Joined: 23-February 06
From: Northeastern Connecticut USA
Member No.: 11,487



I thought they were called 'crawlers'? The bots that look through your <META> tags to rank you in your pages. Maybe there is something called spiders, if a professor talked about it. Sounds interesting. I'm just new to learning about how to get noticed by search engines, and the 'crawlers' is what I have read about.
Go to the top of the page
 
+Quote Post
BHerath
post Jun 11 2008, 04:01 AM
Post #5


Newbie [ Level 2 ]
Group Icon

Group: Members
Posts: 22
Joined: 8-May 08
Member No.: 30,211



Here is how a search engine works.
There are 5 parts that make up a search engine.

Spider - This program downloads web pages just like a web browser. The spider does not take images (or like media) in to account. It only downloads the html version.

Crawler - This program finds all links on each page. The crawler follows these links and tries to find documents not already known to the search engine.

Indexer - This component parses each page and analyzes the various elements, such as text, headers, structural or stylistic features, special HTML tags, etc.

Database - This is the storage area for the data that the search engine downloads and analyzes. Sometimes it is called the index of the search engine.

Results Engine - The results engine ranks pages. It determines which pages best match a user's query and in what order the pages should be listed.

Web server - The search engine web server usually contains a HTML page with an input field where the user can specify the search query he or she is interested in. The web server is also responsible for displaying search results to the user in the form of an HTML page.

That is how a search engine works. So what matters is the html page rather than what we see in the browser in terms of SEO. A great book is SEO MINDSET. Find it and read it.
Go to the top of the page
 
+Quote Post
LegallyHigh
post Jun 16 2008, 05:22 AM
Post #6


Member - Active Contributor
Group Icon

Group: [HOSTED]
Posts: 80
Joined: 12-April 08
Member No.: 29,760



I wonder how large the Google Database is by this point? I'm guessing its deffinately in Terabytes by now, since it stores Caches of pages, along with images, videos , and every other form of Content. And also, what kind of Languages are various search engines programmed in.
Go to the top of the page
 
+Quote Post
Mordent
post Jun 16 2008, 06:17 PM
Post #7


Advanced Member
Group Icon

Group: [HOSTED]
Posts: 194
Joined: 30-June 07
Member No.: 23,045



QUOTE(LegallyHigh @ Jun 16 2008, 06:22 AM) *
I wonder how large the Google Database is by this point? I'm guessing its deffinately in Terabytes by now, since it stores Caches of pages, along with images, videos , and every other form of Content. And also, what kind of Languages are various search engines programmed in.

From Wikipedia:

QUOTE
Servers are commodity-class x86 PCs running customized versions of Linux. Indeed, the goal is to purchase CPU generations that offer the best performance per dollar, not absolute performance. Estimates of the power required for over 450,000 servers range upwards of 20 megawatts, which could cost on the order of US$2 million per month in electricity charges.

Specifications:

* Upwards of 15,000 servers[4]ranging from a 533 MHz Intel Celeron to a dual 1.4 GHz Intel Pentium III (as of 2003); a 2005 by Paul Strassmann has 200,000 servers,[6] while unspecified sources claimed this number to be upwards of 450,000 in 2006.[1]
* One or more 80GB hard disks per server (2003)
* 2–4 GB of memory per machine (2004)

The exact size and whereabouts of the data centers Google uses are unknown, and official figures remain intentionally vague. In a 2000 estimate, Google's server farm consisted of 6000 processors, 12,000 common IDE disks (2 per machine, and one processor per machine), at four sites: two in Silicon Valley, California and two in Virginia.[7] Each site had an OC-48 (2488 Mbit/s) internet connection and an OC-12 (622 Mbit/s) connection to other Google sites. The connections are eventually routed down to 4 x 1 Gbit/s lines connecting up to 64 racks, each rack holding 80 machines and two ethernet switches. The servers run custom server software called Google Web Server.

15,000 * 80GB = 1.2 PB (Petabytes) as a minimum amount of hard drive capacity. Sure, a fair chunk of it probably isn't used, just as yet more is used for the OS and other such necessary software, but damn me if Google would buy too many more servers than they needed to. Still, my estimate is hardly particularly thought out or detailed. I took the liberty of doing a little more research in to some people who have made more accurate estimates.

In short: the Google machine is big, scary, and makes most supercomputers cry themselves to sleep. wink.gif
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic

Collapse

> Similar Topics

Topics Topics
  1. Search Engines(1)


 



- Lo-Fi Version Time is now: 9th July 2008 - 05:00 AM