logo       
Google Custom Search
    AddThis Social Bookmark Button

Re: Possible problem with RobotRules?: msg#00068

Subject: Re: Possible problem with RobotRules?
Hi Liam Quinn,

I understand what you're saying and I completely agree with you if I had not read something different at the w3c.org and that Yahoo! indexed the example site below. (please notice Subject "Possible" problem with RobotRules?)

According to this document:

http://www.w3.org/TR/1998/REC-html40-19980424/appendix/notes.html#h-B.4.1.1

B.4.1 Search robots
The robots.txt file

It states:

Some tips: URI's are case-sensitive, and "/robots.txt" string must be all lower-case. Blank lines are not permitted.

"Blank lines are not permitted." is stated here and I wouldn't have asked this question if the W3C was not the one stating this. I personally believe the W3C is in error, but there are a lot of people who believe the W3C is God here.

So who do we believe and who is correct? Isn't the W3C the authority on this stuff? This is why I posted this question as I feel we need some clarification.

Thanks!

On Sat, 18 Dec 2004, J and T wrote:

> I recently came accross something that didn't seem right to me. I'm using > "WWW::RobotRules::AnyDBM_File", but the below sample script will return the
> same thing.
>
> The URL I tested is:
> http://www.midwestoffroad.com/
>
> The robots.txt reads:
>
> User-agent: *
> Disallow: admin.php
> Disallow: error.php
> Disallow: /admin/
> Disallow: /images/
> Disallow: /includes/
> Disallow: /themes/
> Disallow: /blocks/
> Disallow: /modules/
> Disallow: /language/
> User-agent: Baidu
> Disallow: /
>
> RobotRules returns that the URL is denied by robots.txt which should not be
> the case.

That's debatable.  The robots.txt file is invalid according to
<http://www.robotstxt.org/wc/norobots.html>:

    The file consists of one or more records separated by one or more
    blank lines
    [...]
    The record starts with one or more User-agent lines, followed by one
    or more Disallow lines

So "Disallow: /" is part of the record begun with "User-agent: *".  It's
reasonable to ignore the misplaced "User-agent: Baidu" or to treat it as
though it were placed at the start of the record.

--
Liam Quinn







Try Searching:
servers, voip, java, networking, microsoft ...
<Prev in Thread] Current Thread [Next in Thread>