Web Robots / Crawlers / Spiders Etc. - All you need to know about them...

Pages: 1, 2
free web hosting

Read Latest Entries..: (Post #12) by finaldesign on Jan 18 2006, 03:01 PM. (Line Breaks Removed)
one little trick I've found recently... If you don't want to get your page cached to search engine, you may use this tweak:QUOTEQ. Can I prevent Teoma/Ask Jeeves search engine from showing a cached copy of my page?A: Yes. We obey the "noarchive" meta tag. If you place the following command in your HTML page, we will not provide an archived copy of the document to the user.< ME... read more.
Read the FIRST post of this Topic. - Express your Opinion! Contribute Knowledge :-).

Free Web Hosting > Computers & Tech > Search Engines > SE Technology

Web Robots / Crawlers / Spiders Etc. - All you need to know about them...

finaldesign
Hello!

While browsing the net today I found this great resource related to web robots, search engines, web crawlers, spiders and such stuff. The page is: The web robots

You can find many usefull stuff there if you are webmaster, or you are interested in making your own spider/indexer, or similar stuff. You can even look at their database of Web Robots - some of them even have their own source code, so you can compile one for yourself... rolleyes.gif

Reply

szupie
Where would you put your robot once you have it? I can't find that information on the site. Would you put it on a server (like astahost) or on your own computer? And does indexing take a lot of space and bandwidth?

Reply

finaldesign
I think it takes alot of space... and consumes bandwith and much of processor time...

Reply

szupie
Oh. Well, then, I'd better just have Google index my site for me, instead of wearing out Astahost's servers. Nice link, though.

Reply

finaldesign
Anyway many of that WEB-robots are capable to catch emails, and harvest them into database... spammers use that very often, to get targeted audience. I figured if we research some of this methods, maybe we could better protect ourself from spam and junk emails... Anyway, I'll post what I discover later here..

Reply

sagaxx
google robots are the best robots and it has like hundreds of them , it uses a lot of bandwith so it isn`t good for little sites , they are good for sites like google , msn etc

Reply

finaldesign
QUOTE(sagaxx @ Dec 26 2005, 06:05 PM)
google robots are the best robots and it has like hundreds of them , it uses a lot of bandwith so it isn`t good for little sites , they are good for sites like google , msn etc
*


well idea of this is testing and learning, anyway if you know how many spiders work, you will be able to make your web pages better and that way increase your page rank on search engines - and that's what we all want rolleyes.gif

Reply

YudzzY
the link is good, but not much of my use. i will try to see how to make it in use for futur..
for the time being i will let google do the work for me ;-)

Reply

Khymnon
I think that, with the current state of search engine optimization, we should let the SEs do the indexing themselves. Unless one knows exactly how every single SE works to index and rank their site, one will most likely hurt his ranking at one engine or another.

I remember one time when Google used H1 and TITLE tags as a primary criteria for their ranking, while MSN had them at 4th and 6th. And right now, Google mainly uses a system called Vector Analysis, where they analyse the overall theme of your Website, and adjust your ranking accordingly. It's still work-in-progress, but Google partly uses it, while not many others do.

So my point is, until SEs can reach a certain level of standardization, we should let each SE do what it likes most. Plus, it doesn't really take that much bandwidth, not more than any hungry visitor to your site would take.

Reply

szupie
QUOTE(Khymnon @ Jan 3 2006, 01:33 PM)
I remember one time when Google used H1 and TITLE tags as a primary criteria for their ranking, while MSN had them at 4th and 6th.  And right now, Google mainly uses a system called Vector Analysis, where they analyse the overall theme of your Website, and adjust your ranking accordingly.  It's still work-in-progress, but Google partly uses it, while not many others do.
*



When you say "theme", do you mean the content theme or the visual theme of the site? And how can the robot adjust the ranking with the H1, title or theme? I don't get how a computer can tell what is good and what is bad.

Reply

Latest Entries

finaldesign
one little trick I've found recently... If you don't want to get your page cached to search engine, you may use this tweak:

QUOTE
Q. Can I prevent Teoma/Ask Jeeves search engine from showing a cached copy of my page?

A: Yes. We obey the "noarchive" meta tag. If you place the following command in your HTML page, we will not provide an archived copy of the document to the user.
< META NAME = "ROBOTS" CONTENT = "NOARCHIVE" >

If you would like to specify this restriction just for Teoma/Ask Jeeves, you may use "teoma" in place of "robots".

Quoted from this page

so this way, if you are building a web, for example, and you don't want to be indexed on google, because you plan to change a domain name or server later, so you want to be indexed then, so that way search engines like google will get your final version of web, when you finally remove this tag. wink.gif

Reply

finaldesign
QUOTE(szupie @ Jan 4 2006, 12:18 AM)
When you say "theme", do you mean the content theme or the visual theme of the site? And how can the robot adjust the ranking with the H1, title or theme? I don't get how a computer can tell what is good and what is bad.
*


hmm... he probably think on content theme, because I don't see how an software can understand graphic... tongue.gif

Reply


Got an Opinion! Express your Views! (no registration):-
Add your Reply/ Opinion/ Views/ Comments/ Suggestion/ Questions/ Queries etc.
Posts with decent grammar & English will be accepted and please refrain from profanities.
For asking a Question, We recommend you to sign-up (for free) so that you can track the topic easily.

Nature of your Post*: Opinion/ Reply/ Comments
Question/Query
Feedback to us.
       
Name   Email
Title/Question*

(Maximum characters: 10,000)
You have characters left.
Confirm Code:

Pages: 1, 2
Similar Topics

Keywords : web, robots, crawlers, spiders

  1. The Spider Catcher
    Catching spiders in order to free them in your garden (5)
  2. Google Spiders Scan Astahost Daily
    lovely advertising (3)
    Hello , I had some troubles with my site at the begining of hosting so I posted these couple of
    questions here and got the answers , almost solved all my problems /smile.gif"
    style="vertical-align:middle" emoid=":)" border="0" alt="smile.gif" /> now my site is working fine
    so I used google submit URL for it and it didnt /wink.gif" style="vertical-align:middle" emoid=";)"
    border="0" alt="wink.gif" /> yet , now everytime I google my site the search results are from
    astahost all like that 1-my site doesnt work help me 2-www doesnt work what can i do
    3-advanced fr....
  3. Can Search Engine Spiders See Dynamic Content?
    (1)
    I want to make a web page using PHP codes to require() a list of keywords to put in between the
    meta keywords tag. If I did this, would the search engines be able to see these PHP generated
    keywords, thus making it more SE friendly?....
  4. Test Your Robots.txt File With Google
    (6)
    As we all know, Robots are programs that traverse the Web automatically. Some people call
    them Crawlers or Spiders. Quite often, you need to restrict a robot lke (GoogleBOT) from
    crawling specific portion of your website. You can do it in two different ways. Firstly, it is done
    by including a specially formatted file on his site, namely robots.txt, in
    http://www.yourdomain.com/ robots.txt . Also Robots META tag(" special HTML META tag") may also
    be used to indicate if a page may or may not be indexed, or analysed for links by a crawler.
    Usually, a co....
  5. New Biological Robots Build Themselves
    (20)
    Inspired by biological systems, scientists have developed miniature robots that can self-assemble
    using parts that float randomly in their environments. The robots can correct their own mistakes.
    /ohmy.gif' border='0' style='vertical-align:middle' alt='ohmy.gif' /> AI, robots that build
    themselves, one rogue robot that watches "Terminator" or "AI" and gets some ideads in it AI....
    MSNBC ....
  6. Do Google Crawlers Index Our Sites?
    (1)
    Do the Google crawlers pick up info about all/most of the websites hosted here? If they do how
    frequently do they scan through the servers? Do all hosted sites include a robot.txt thingy or
    .htaccess (forgot what one google looks at)? lol, my last site got on it but then my host started
    stuffing up my site so i stopped using it. QUOTE This post has been edited by
    microscopic^earthling: Today, 07:41 PM What was changed?....
  7. Are Viruses Considered As 'alive'
    Following up on the robots-life issue (47)
    Before I started this topic, I looked up the words Life , virus and The characteristics of
    life . NOTE: if you come across terms you don't understand, plz scroll down. I have included a
    term-list there. The definition of life is too long to post here, check it out on the link. Why is
    this definition so long? It's easy: exceptions. Our planet houses such a biodiversity that
    everytime they make an all-included definition, a (new) species pops up from which we're sure
    it's an animal, but misses one key in the chain that makes it life. In order to....
  8. Are Robots Considered Humans?
    (55)
    I have always pondered this question, and after seeing the movie I robot I started to think to
    myself, if we were playing another form of god. But to tell you the truth I'm still not sure.
    Being a person means you have rights, and the government is obligated to defend those rights. But
    with the rise of artificial intelligence, we are facing new questions about what it takes to
    constitute a person. In the movie ‘I Robot’, was Sonny considered to be a person? When he committed
    the murder, there was a lot of questioning on whether he, as a machine should be charged or ....
  9. The Flaming Lips- Yoshimi... 5.1
    yoshimi battles the pink robots 5.1 (1)
    Just curious. Anyone who likes true alternative rock should pick up this cd dvd combo. Its kind of a
    rare find, but well worth your money. Their music on this cd is very heavily plastered with synth
    parts and goofy vocals and lyrics. Tons of guitar effects all over the place. the dvd is amazing
    with a few different music videos, dvd audio bonus tracks, and "waveform cartoons". The cartoons
    look like windows media player. The audio from the cd is also on the cd, but on the dvd it is mixed
    for 5.1 surround sound. The effect is dizzying, and sounds great, even through headp....
  10. Robots Meta Tag Introduction
    bots and spiders crawling the web Part 2 (5)
    Robots meta tags, do we need them. What good do they do and what does it control. “Robots Meta Tag”
    are used to control if you want a spider or a bot to index a html page or not. You can give
    permission to index your whole site and the spider will crawl all your pages. This is a great way
    to control bots and spiders if you don’t have access to the root directory and robots.txt file.
    Some search engines (not all) fully obey the “Robots Meta Tag”. What is the format of “Robots Meta
    Tag” and where do I put it on my site? The “Robots Meta Tag” are placed in your HTML d....
  11. Robots.txt Introduction
    bots and spiders crawling the web (12)
    Search engines look in the "root" directory for robots.txt. This file first tell the spider / bot
    (called "User-agent" from now on) what files it can harvest and the folders it can harvest from.
    This is called " The Robots Exclusion Standard ". The format (syntax) of the robots.txt file has to
    be followed. It consists of records that have 2 fields. The first is the " User-agent Line " the
    second is one or more " Disallow Line(s) " Syntax is ":" You should create the text in UNIX
    line ender mode. Good text editors or one of the Linux line editors work. ***WA....
  12. Meta Tags
    To Stop Spiders (17)
    Sometimes you don't want search engine spiders to spider your site. For example, you may not
    want your pages to show up in a google search. Here is what you can do: use meta tags in your web
    pages. You can use meta tags to control indexing and crawling of your site. By default, every
    single page in your site will be indexed by search engine spiders. To control this default action,
    just use meta tags! How? Your meta tags must be located in the HTML codes of your pages, in the
    header between the and tags. So if you want spiders to index your page and follow ....

    1. Looking for web, robots, crawlers, spiders

Searching Video's for web, robots, crawlers, spiders
advertisement




Web Robots / Crawlers / Spiders Etc. - All you need to know about them...



 

 

 

 

ADD REPLY / Got an Opinion! a humble request :-) RAPID SEARCH! Free Hosting [X]
Express your Opinions, Thoughts or Contribute more info. to help others.
Ask your Doubts & Queries to get answers, So that "Together We can help others!"
Register FREE for AD-FREE forum, Create your own topics, Ask Questions, track topics, setup subscriptions & notifications and Get a Free Website w/ Email and FTP.
500MB Space *No Ads*, CPanel, FTP, PHP, MySQL, EMails - 100% FREE