Hi Liam Quinn,
I understand what you're saying and I completely agree with you if I had not
read something different at the w3c.org and that Yahoo! indexed the example
site below. (please notice Subject "Possible" problem with RobotRules?)
According to this document:
http://www.w3.org/TR/1998/REC-html40-19980424/appendix/notes.html#h-B.4.1.1
B.4.1 Search robots
The robots.txt file
It states:
Some tips: URI's are case-sensitive, and "/robots.txt" string must be all
lower-case. Blank lines are not permitted.
"Blank lines are not permitted." is stated here and I wouldn't have asked
this question if the W3C was not the one stating this. I personally believe
the W3C is in error, but there are a lot of people who believe the W3C is
God here.
So who do we believe and who is correct? Isn't the W3C the authority on this
stuff? This is why I posted this question as I feel we need some
clarification.
Thanks!
On Sat, 18 Dec 2004, J and T wrote:
> I recently came accross something that didn't seem right to me. I'm
using
> "WWW::RobotRules::AnyDBM_File", but the below sample script will return
the
> same thing.
>
> The URL I tested is:
> http://www.midwestoffroad.com/
>
> The robots.txt reads:
>
> User-agent: *
> Disallow: admin.php
> Disallow: error.php
> Disallow: /admin/
> Disallow: /images/
> Disallow: /includes/
> Disallow: /themes/
> Disallow: /blocks/
> Disallow: /modules/
> Disallow: /language/
> User-agent: Baidu
> Disallow: /
>
> RobotRules returns that the URL is denied by robots.txt which should not
be
> the case.
That's debatable. The robots.txt file is invalid according to
<http://www.robotstxt.org/wc/norobots.html>:
The file consists of one or more records separated by one or more
blank lines
[...]
The record starts with one or more User-agent lines, followed by one
or more Disallow lines
So "Disallow: /" is part of the record begun with "User-agent: *". It's
reasonable to ignore the misplaced "User-agent: Baidu" or to treat it as
though it were placed at the start of the record.
--
Liam Quinn
|
Try Searching:
servers, voip, java, networking, microsoft ...
|
|
|
|