Welcome Guest ( Log In | Register )



2 Pages V   1 2 >  
Reply to this topicStart new topic
> Meta Tags, To Stop Spiders
jcguy
post Sep 6 2004, 07:33 AM
Post #1


Premium Member
Group Icon

Group: Members
Posts: 382
Joined: 5-September 04
Member No.: 255



Sometimes you don't want search engine spiders to spider your site. For example, you may not want your pages to show up in a google search. Here is what you can do: use meta tags in your web pages. You can use meta tags to control indexing and crawling of your site.

By default, every single page in your site will be indexed by search engine spiders. To control this default action, just use meta tags!

How? Your meta tags must be located in the HTML codes of your pages, in the header between the <head> and </head> tags.

So if you want spiders to index your page and follow every link on it, insert:
<meta name="robots" content="index, follow">

For no indexing but following of links, use:
<meta name="robots" content="noindex, follow">

So more combinations are possible, example:
<meta name="robots" content="index, nofollow">

and:
<meta name="robots" content="noindex, nofollow">

Let's say you don't want your pages to be indexed but want spiders to follow links on it, your HTML should starts like this:

<html>
<head>
<title>Page title</title>
<meta name="robots" content="noindex, follow">
</head>
<body>body contents</body>
</html>

That's it! Spiders will always check for such meta tags before deciding waht to do with your pages.

Hope this helps smile.gif
Go to the top of the page
 
+Quote Post
overture
post Sep 7 2004, 02:48 PM
Post #2


Premium Member
Group Icon

Group: Members
Posts: 208
Joined: 6-September 04
From: England
Member No.: 315



that is very interesting to know jcguy, there are tons of things that you can do with meta tags which i have heard about before, they can be very useful for people in certain situations.

This topic seems very familiar to me, were you at Inuration Technologys at all? cos this and that UNI.CC tutorial are familiar to me?
Go to the top of the page
 
+Quote Post
zarjay
post Sep 10 2004, 01:02 PM
Post #3


Member [ Level 1 ]
Group Icon

Group: Members
Posts: 48
Joined: 6-September 04
Member No.: 318



If you're using XHTML, be sure to close the meta tags with a / at the end of each tag, like this example:
CODE
<meta name="robots" content="noindex, nofollow" />

If having web compliant HTML or XHTML is important to you, also make sure you declare a DOCTYPE in the <html> tag.
Go to the top of the page
 
+Quote Post
overture
post Sep 10 2004, 01:53 PM
Post #4


Premium Member
Group Icon

Group: Members
Posts: 208
Joined: 6-September 04
From: England
Member No.: 315



Does anyone know if there is a way to stop leeches from attaching themselves to files and stealing the bandwidth? could it be done with a specific meta tag or is something else required?
Go to the top of the page
 
+Quote Post
dissipate
post Sep 12 2004, 02:38 PM
Post #5


Advanced Member
Group Icon

Group: [HOSTED]
Posts: 120
Joined: 2-September 04
Member No.: 100



another way to block spiders from visiting and indexing your website if you have your own domain is to create a file called "robots.txt" in the root directory of your website. put this in the text file -

user-agent: * Disallow: /*

- this means that all spiders that are reading this file should not visit or index anything at all.

with this file, you can also actually specify which directories and files you do not want the spiders to index, e.g.

user-agent: * Disallow: /index.html
Disallow: /animals/*
Disallow: /objects/toaster.html
Go to the top of the page
 
+Quote Post
currahee
post Sep 12 2004, 03:48 PM
Post #6


Member - Active Contributor
Group Icon

Group: [HOSTED]
Posts: 82
Joined: 9-September 04
From: At my computer desk
Member No.: 434



Ohhh wow that's really interesting! =) overture, there is cPanel provided by asta host... I was at inuration so i;ve used cpanel. I think its called Web Protect (htaccess editor)? I'm not too sure...
Go to the top of the page
 
+Quote Post
dissipate
post Sep 13 2004, 09:53 AM
Post #7


Advanced Member
Group Icon

Group: [HOSTED]
Posts: 120
Joined: 2-September 04
Member No.: 100



QUOTE(overture @ Sep 10 2004, 09:53 PM)
Does anyone know if there is a way to stop leeches from attaching themselves to files and stealing the bandwidth? could it be done with a specific meta tag or is something else required?
*



i assume you're talking about images. here's what you can do -

1. create a separate folder and put all your images (or the images you wish to protect) in it.

2. create a text file called ".htaccess" in the above folder.

3. the text file should contain these lines -

SetEnvIfNoCase Referer "^http://www.blah.com/" locally_linked=1
SetEnvIfNoCase Referer "^http://www.blah.com$" locally_linked=1
SetEnvIfNoCase Referer "^http://blah.com/" locally_linked=1
SetEnvIfNoCase Referer "^http://blah.com$" locally_linked=1
SetEnvIfNoCase Referer "^$" locally_linked=1
<FilesMatch "\.(gif|jpe?g)$">
Order Allow,Deny
Allow from env=locally_linked
</FilesMatch>

- and replace blah.com with your domain name.

4. now when someone tries to steal bandwidth by linking to your image from their site, the image will not be displayed.


if there are other files that you want to protect e.g. zip files, just put them in that special folder and add the zip extension to the FilesMatch line -

<FilesMatch "\.(gif|jpe?g|zip)$">

- that's it!

hope this helps :)
Go to the top of the page
 
+Quote Post
zarjay
post Sep 13 2004, 02:31 PM
Post #8


Member [ Level 1 ]
Group Icon

Group: Members
Posts: 48
Joined: 6-September 04
Member No.: 318



The best way to stop search engine spiders from snooping on your website is to use a robots.txt file, but if you want to protect a webpage that you have limited access to (i.e., a simple blog like LiveJournal), then you can use the meta tag alternative.
Go to the top of the page
 
+Quote Post
overture
post Sep 13 2004, 04:20 PM
Post #9


Premium Member
Group Icon

Group: Members
Posts: 208
Joined: 6-September 04
From: England
Member No.: 315



dissipate, i was talking about leeches witch attach themselves to downloadable files which when clicked use up your bandwidth, as i know someone who had 20 gigs of bandwidth used in a few days due to them. i was wondering how to stop them, would the way you suggest work for downloadable files, like .zip/.rar/.exe...

edit:

would i have to change the extension types, like you have put gif|jpe?g
Go to the top of the page
 
+Quote Post
r3d
post Sep 13 2004, 05:15 PM
Post #10


death
Group Icon

Group: Members
Posts: 268
Joined: 8-September 04
Member No.: 384



most download sites use server side scripts to verify the referer if the referer is not from them it won't download anything smile.gif
Go to the top of the page
 
+Quote Post

2 Pages V   1 2 >
Reply to this topicStart new topic

Collapse

> Similar Topics

Topics Topics
  1. HTML Tags(4)


 



- Lo-Fi Version Time is now: 11th October 2008 - 06:00 PM