Welcome Guest ( Log In | Register )



 
Reply to this topicStart new topic
> Test Your Robots.txt File With Google
sid.calcutta
post Feb 14 2006, 02:54 AM
Post #1


Advanced Member
******

Group: Validating
Posts: 111
Joined: 28-January 06
Member No.: 10,917



As we all know, Robots are programs that traverse the Web automatically. Some people call them Crawlers or Spiders.

Quite often, you need to restrict a robot lke (GoogleBOT) from crawling specific portion of your website.
You can do it in two different ways.
Firstly, it is done by including a specially formatted file on his site, namely robots.txt, in http://www.yourdomain.com/robots.txt.

Also Robots META tag(" special HTML META tag") may also be used to indicate if a page may or may not be indexed, or analysed for links by a crawler.

Usually, a combination of Robots META TAG and robots.txt file is used to get the best result.

In a nutshell, when a Robot ( like GoogleBOT,msnBOT) visits a Web site, say http://www.yourdomain.com/, it firsts checks for http://www.foobar.com/robots.txt. If it can find this document, it will analyse its contents for records.
An example of a simple robots.txt file is shown below:

QUOTE
User-agent: *
Disallow: /login.php

The above line directs all the robots ( Google, msn, Yahoo) not to scroll, login.php file located in the root directory of your website.

A detailed discussion on robots.txt file is available at Robotstxt.org

Google has added a new feature to check for URLs excluded in robots.txt file. That is, you can check whether the GoogleBOT is complying with the instructions of the robots.txt file or not in a matter of few seconds. You need to have a Google Account to check for the robots.txt file of your website.

Only thing is that, it is in BETA form, ( like Google Sitemap), but it may turn out to be pretty effective in near future.
Go to the top of the page
 
+Quote Post
hatim
post Feb 14 2006, 07:31 AM
Post #2


Advanced Member
Group Icon

Group: Members
Posts: 196
Joined: 17-June 05
From: Topi,Swabi,NWFP,Pakistan
Member No.: 6,301



I wonder why its a txt and not an XML. XML would be a far better choice.
Go to the top of the page
 
+Quote Post
twitch
post Feb 14 2006, 12:59 PM
Post #3


Veteran Nut
Group Icon

Group: Members
Posts: 527
Joined: 4-October 05
From: UK
Member No.: 8,895



It doesn't matter what it is. All it does is contain simple information, that any basic language can understand.

Go to the top of the page
 
+Quote Post
Jeigh
post Feb 15 2006, 05:28 PM
Post #4


Whitest Black Mage
Group Icon

Group: [MODERATOR]
Posts: 1,316
Joined: 20-May 05
From: NB, Canada
Member No.: 5,281



Awesome, I've never really thought about it but was always curious how those bots worked and if you could influence them directly in any way, I'll have to read up a bit on that site you posted when I have some more free time, thanks for the link ohmy.gif
Go to the top of the page
 
+Quote Post
Quatrux
post Feb 15 2006, 06:26 PM
Post #5


the Q
Group Icon

Group: [HOSTED]
Posts: 1,013
Joined: 13-July 05
From: Lithuania, Vilnius
Member No.: 7,059



It is not an xml file, because txt is much more simpler to use and besides robots.txt files are available for a really long time and XML is not so old.. not everyone know how to use XML and might not use it, this is one of the simplest things to do, but I agree that an addition could be made, but most robots would not support it, only the ones which are updated..
Go to the top of the page
 
+Quote Post
jake658879
post Feb 20 2006, 03:00 AM
Post #6


Member [ Level 1 ]
Group Icon

Group: Members
Posts: 30
Joined: 20-February 06
Member No.: 11,416



I agree. besides, a lot of people dont know xml (like me) and it would be hard to make, and we might screw it up. (like i did with google sitemaps lol). You can also create a .htaccess file which will force bots to comply with ur commands but most free hosts do not allow it.
Go to the top of the page
 
+Quote Post
austiniskoge
post Sep 2 2006, 06:04 PM
Post #7


Premium Member
Group Icon

Group: Members
Posts: 216
Joined: 7-March 05
From: Carrollton, TX
Member No.: 2,953



hmm... sounds interesting.

Maybe I'll try that- it would be fun to mess around with the famous Google bots. haha

But I'll have to look it over again some other time.
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic

Collapse

> Similar Topics

Topics Topics
  1. Google Webspace(21)
  2. The Bad Side Of .....google(50)
  3. The Hacker's Google(29)
  4. Rate Google On A Scale Of 1-10(59)
  5. Google - How To Hit The Top(13)
  6. SiteAdvisor: Search Engines Take Users To Spyware Sites(11)
  7. Google- Changing The Search Preferences!(3)
  8. How Long Does It Take To Be Listed On Google?(15)
  9. Google(3)
  10. New Universal Search By Google(1)
  11. Google Webpages!(16)
  12. Removing Information From Google Search Engine(8)
  13. Google Problem(5)
  14. Google Launches Us Wireless Crusade(0)
  15. Fun Answers Google Will Give You(10)
  1. Submit(3)
  2. Google Wants You To Share Stuff(4)
  3. Google Analytics Tracking Code Position(10)
  4. Do Google Search Better Than Yahoo?(14)
  5. Tips To Earn In Google Adsense(7)
  6. Does Google Provide This Tool?(9)
  7. Do You Like Google?(26)
  8. My First Google Error(8)
  9. Google Servers(2)
  10. Geocamming On "google" .....look At What They Are Watching........(2)
  11. How To View Live Traffic Cameras Using Google(3)
  12. Is The Sandbox Only For Google?(0)
  13. Google Apps(0)


 



- Lo-Fi Version Time is now: 22nd August 2008 - 03:13 AM