logo       

HTML::Parser modifies unicode characters: msg#00017

lang.perl.modules.lwp

Subject: HTML::Parser modifies unicode characters

Hi,

It appears that HTML::Parser modifies some unicode characters while
parsing. The following program gives an example:

#########

#!/usr/bin/perl
use HTML::Parser;
use utf8;
open TEST, '>:utf8', 'word.txt';
my $p = new HTML::Parser text_h => [sub {print TEST shift}, 'text'];
$p->parse("zespołów\n");
close TEST;

#########

After running it, 'word.txt' contains "zespołów" with the funny l and
the funny o following it transformed to something else. What am I doing
wrong?
I'm running: perl 5.8.5, HTML::Parser version 3.36 on linux.

Thanks,
Moshe

Attachment: pgptu677CJvVw.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise