Hide webserver contents from search engines / Robots.txt

Source: http://www.searchtools.com/robots/robots-txt.html

My Notes: If you have a website and would like to keep some content private i.e. not to be used by search crawlers for indexing, then create a file called Robots.txt in your root directory, in which you can disallow indexing of certain files and folders. check this example](http://www.whitehouse.gov/robots.txt) of robots.txt file

Writeup from the source:
Search engine robots will check a special file in the root of each server called robots.txt, which is, as you may guess, a plain text file (not HTML). Robots.txt implements the Robots Exclusion Protocol, which allows the web site administrator to define what parts of the site are off-limits to specific robot user agent names. Web administrators can disallow access to cgi, private and temporary directories, for example, because they do not want pages in those areas indexed.

Source: http://www.searchtools.com/robots/robots-txt.html

Re: Hide webserver contents from search engines / Robots.txt

I spent some time learning all about search optimization few years back (keywords/description meta tags, robots.txt, etc). All of my sites are well visible in google with the description text displayed as I want it. The rating currently is not at its best but it has been in the past. I have always used robots.txt to control what gets crawled.

Re: Hide webserver contents from search engines / Robots.txt

Well robots.txt is the best tool to tell bots from Google, Yahoo!, MSN and other big search engines but no spam bot ever follows this file. I don't think there is any better way to hide content from anyone than htaccess files. I am not sure if something similar is available for Windows servers or not.

Re: Hide webserver contents from search engines / Robots.txt

the robots probably do the job then…

there is a syngress book on how to use google for getting the hidden content of webservers. Check this link ---- I m sure many of you have read abt this book already… It demonstrates pen-tests using Google’s advance search features

Re: Hide webserver contents from search engines / Robots.txt

No, I mean if there a bot who refuses to follow the robots.txt, it there a way to stop it to index the content of XYZ part of site?

Re: Hide webserver contents from search engines / Robots.txt

^ u can use the .htaccess file with user-agent header, but this is not fool proof

Re: Hide webserver contents from search engines / Robots.txt

GadhoN ki kami nahi Ghalib, Aik dhoondo hazar miltay haiN.

How in the world are you gonna use .htaccess on a Windows server without running Apache?

Re: Hide webserver contents from search engines / Robots.txt

^ you are asking your question wrong.

Can you use .htaccess with windoes server? Yes - with Apache running on windows
Can you use it with IIS? No. You can secure directories in IIS under Directory Security Tab.

Re: Hide webserver contents from search engines / Robots.txt

olddddd