|
HTML::Parser modifies unicode characters: msg#00017lang.perl.modules.lwp
Hi, It appears that HTML::Parser modifies some unicode characters while parsing. The following program gives an example: ######### #!/usr/bin/perl use HTML::Parser; use utf8; open TEST, '>:utf8', 'word.txt'; my $p = new HTML::Parser text_h => [sub {print TEST shift}, 'text']; $p->parse("zespołów\n"); close TEST; ######### After running it, 'word.txt' contains "zespołów" with the funny l and the funny o following it transformed to something else. What am I doing wrong? I'm running: perl 5.8.5, HTML::Parser version 3.36 on linux. Thanks, Moshe
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Help, Please: Can't Get a Hold of <input type="button" ...> tag.: 00017, Gedanken |
|---|---|
| Next by Date: | Re: HTML::Parser modifies unicode characters: 00017, Moshe Kaminsky |
| Previous by Thread: | passing xml to remote server using lwpi: 00017, PerlDiscuss - Perl Newsgroups and mailing lists |
| Next by Thread: | Re: HTML::Parser modifies unicode characters: 00017, Dominic Mitchell |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |