Loading...


bookmark - Web Robots / Crawlers / Spiders Etc. All you need to know about them...

Web Robots / Crawlers / Spiders Etc. - All you need to know about them...

 
 Discussion by finaldesign with 12 Replies.
 Last Update: January 18, 2006, 7:01 am
 
bookmark - Web Robots / Crawlers / Spiders Etc. All you need to know about them...  
Quickly Post to Web Robots / Crawlers / Spiders Etc. All you need to know about them... w/o signup Share Info about Web Robots / Crawlers / Spiders Etc. All you need to know about them... using Facebook, Twitter etc. email your friend about Web Robots / Crawlers / Spiders Etc. All you need to know about them... Print
Reply / Comment New Discussion / Topic Share / Bookmark E-Mail a Friend Print

Hello!

While browsing the net today I found this great resource related to web robots, search engines, web crawlers, spiders and such stuff. The page is: The web robots

You can find many usefull stuff there if you are webmaster, or you are interested in making your own spider/indexer, or similar stuff. You can even look at their database of Web Robots - some of them even have their own source code, so you can compile one for yourself... :huh:







   Mon Dec 12, 2005    Reply         

Where would you put your robot once you have it? I can't find that information on the site. Would you put it on a server (like astahost) or on your own computer? And does indexing take a lot of space and bandwidth?

   Mon Dec 12, 2005    Reply         

I think it takes alot of space... and consumes bandwith and much of processor time...

   Tue Dec 13, 2005    Reply         


Oh. Well, then, I'd better just have Google index my site for me, instead of wearing out Astahost's servers. Nice link, though.

   Tue Dec 13, 2005    Reply         

Anyway many of that WEB-robots are capable to catch emails, and harvest them into database... spammers use that very often, to get targeted audience. I figured if we research some of this methods, maybe we could better protect ourself from spam and junk emails... Anyway, I'll post what I discover later here..

   Thu Dec 15, 2005    Reply         

google robots are the best robots and it has like hundreds of them , it uses a lot of bandwith so it isn`t good for little sites , they are good for sites like google , msn etc

   Mon Dec 26, 2005    Reply         


QUOTE (sagaxx)

google robots are the best robots and it has like hundreds of them , it uses a lot of bandwith so it isn`t good for little sites , they are good for sites like google , msn etc


well idea of this is testing and learning, anyway if you know how many spiders work, you will be able to make your web pages better and that way increase your page rank on search engines - and that's what we all want :huh:







   Tue Dec 27, 2005    Reply         

the link is good, but not much of my use. i will try to see how to make it in use for futur..
for the time being i will let google do the work for me ;-)

   Tue Dec 27, 2005    Reply         

I think that, with the current state of search engine optimization, we should let the SEs do the indexing themselves. Unless one knows exactly how every single SE works to index and rank their site, one will most likely hurt his ranking at one engine or another.

I remember one time when Google used H1 and TITLE tags as a primary criteria for their ranking, while MSN had them at 4th and 6th. And right now, Google mainly uses a system called Vector Analysis, where they analyse the overall theme of your Website, and adjust your ranking accordingly. It's still work-in-progress, but Google partly uses it, while not many others do.

So my point is, until SEs can reach a certain level of standardization, we should let each SE do what it likes most. Plus, it doesn't really take that much bandwidth, not more than any hungry visitor to your site would take.

   Tue Jan 3, 2006    Reply         

QUOTE (Khymnon)

I remember one time when Google used H1 and TITLE tags as a primary criteria for their ranking, while MSN had them at 4th and 6th.  And right now, Google mainly uses a system called Vector Analysis, where they analyse the overall theme of your Website, and adjust your ranking accordingly.  It's still work-in-progress, but Google partly uses it, while not many others do.



When you say "theme", do you mean the content theme or the visual theme of the site? And how can the robot adjust the ranking with the H1, title or theme? I don't get how a computer can tell what is good and what is bad.

   Tue Jan 3, 2006    Reply         

hello, szupie

I meant the content theme, of course. you see, SEs start out by indexing your website, which is basically what the "robots" do. the next step is that your site is ran into a complex set of algorithms to analyze it for keywords. now, if an SE finds frequent use of words like "hawaii," "luxury hotels," "spa," and "beverages" for example, their relational databases will assume your site has a "vacations and hotels in hawaii."

of course, these algorithms are not always 100% correct, but they're close enough. your job as a website designer is to make sure you're choosing the words that are searched for the most for your intended theme. naturally, this means you should carefully decide what your site's theme is.

and the fact is, SEs build these databases mostly out of users' searches. when a searcher searches for "java" and is presented with sites which deal with both coffee and the programming language, the SE will take note of the user's choice and use it to decide later which is the most popular choice, and present it at a higher rank later.

if you want to know anything else, or if I haven't made it clear enough, please let me know. :-)

   Wed Jan 4, 2006    Reply         

QUOTE (szupie)

When you say "theme", do you mean the content theme or the visual theme of the site? And how can the robot adjust the ranking with the H1, title or theme? I don't get how a computer can tell what is good and what is bad.


hmm... he probably think on content theme, because I don't see how an software can understand graphic... :D

   Tue Jan 10, 2006    Reply         

one little trick I've found recently... If you don't want to get your page cached to search engine, you may use this tweak:

QUOTE

Q. Can I prevent Teoma/Ask Jeeves search engine from showing a cached copy of my page?

A: Yes. We obey the "noarchive" meta tag. If you place the following command in your HTML page, we will not provide an archived copy of the document to the user.
< META NAME = "ROBOTS" CONTENT = "NOARCHIVE" >

If you would like to specify this restriction just for Teoma/Ask Jeeves, you may use "teoma" in place of "robots".

Quoted from this page

so this way, if you are building a web, for example, and you don't want to be indexed on google, because you plan to change a domain name or server later, so you want to be indexed then, so that way search engines like google will get your final version of web, when you finally remove this tag. :D

   Wed Jan 18, 2006    Reply         

Quickly Post to Web Robots / Crawlers / Spiders Etc. All you need to know about them... w/o signup Share Info about Web Robots / Crawlers / Spiders Etc. All you need to know about them... using Facebook, Twitter etc. email your friend about Web Robots / Crawlers / Spiders Etc. All you need to know about them... Print
Reply / Comment New Discussion / Topic Share / Bookmark E-Mail a Friend Print

Similar Topics:

Robots.txt Introduction

Search engines look in the "root" directory for robots.txt. This file first tell the spider / bot (called "User-agent" from now on) what files it can harvest and the folders it can harvest from. This is called "The Robots Exclusion Standard". The for ...more

   10-Feb-2005    Reply         

Test Your Robots.txt File With Goog...

As we all know, Robots are programs that traverse the Web automatically. Some people call them Crawlers or Spiders. Quite often, you need to restrict a robot lke (GoogleBOT) from crawling specific portion of your website. You can do it in two differen ...more

   13-Feb-2006    Reply         

Effective Website Promotion Without...

I've been working on my website, Handy PHP, pretty hard for the last few months and it has really paid off. According to Yahoo, there are about 1,500 links to my website. This number has been steadily increasing over the past few months and wasn't reall ...more

   12-Apr-2007    Reply         

Is Google Now A Disparate Bunch Of Programs? How do u link them together?   Is Google Now A Disparate Bunch Of Programs? How do u link them together? (10) (3) Blingo Search Engine A SE that gives away random prizes  Blingo Search Engine A SE that gives away random prizes