Hi,
I've installed HTML::Parser on an AIX 5.1 system running perl 5.8.6 along
with HTML::Tagset and all tests passed except for one relating to POD that
was skipped. However, one of our developers found that it didn't properly
parse titles. Here's a sample program that demonstrates the problem. When
run with the perl 5.8.6 that I installed, the output is
Help Title is
(blank line)
but when when run with a copy of perl5.8.0 that someone else installed, we
get:
Help Title is Installation Help
which I assume is correct.
Here's the program:
-------------------------------------------------------------------------
eval 'exec dbtperl5.8.6 -S $0 ${1+"$@"}'
if 0;
use strict;
use warnings;
use HTML::Parser;
my $title='';
my $p = HTML::Parser->new(api_version => 3,);
$p->handler(start=> \&title_handler, 'tagname, self');
$p->parse_file("db2wi.htm");
print "\nHelp Title is $title\n";
exit 0;
########################################
# Subroutines
########################################
sub title_handler {
return if shift ne 'title';
my $self = shift;
$self->handler(text => sub { $title= shift}, 'dtext');
$self->handler(end => sub { shift->eof if shift eq 'title' }, 'tagname,
self');
}
---------------------------------------------------------------------------
and here's the db2wi.htm input file
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US"
xml:lang="en-us">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"
/>
<meta name="dc.language" scheme="rfc1766" content="en-us" />
<!-- All rights reserved. Licensed Materials Property of IBM -->
<!-- US Government Users Restricted Rights -->
<!-- Use, duplication or disclosure restricted by -->
<!-- GSA ADP Schedule Contract with IBM Corp. -->
<meta name="dc.date" scheme="iso8601" content="2006-01-20" />
<meta name="copyright" content="(C) Copyright IBM Corporation 2006" />
<meta name="security" content="public" />
<meta name="Robots" content="index,follow"/>
<meta http-equiv="PICS-Label" content='(PICS-1.1
"http://www.icra.org/ratingsv02.html" l gen true r (cz 1 lz 1 nz 1 oz 1 vz
1) "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0)
"http://www.classify.org/safesurf/" l gen true r (SS~~000 1))' />
<title>Installation Help</title>
<link rel="stylesheet" type="text/css" href="ibmidwb.css" />
</head>
<body>
<a id="Top_Of_Page" name="Top_Of_Page"></a>
<h1>Installation Help</h1>
<br />
<a id="Bot_Of_Page" name="Bot_Of_Page"></a>
</body>
</html>
We've already worked around this issue but I thought I should report it in
case it's a bug that someone wants to fix. Unless there's a coding error
here, I would think that there should at least have been a test case
failure to indicate that something is wrong.
Thanks,
Jack Goldstein
|