logo       

How can I become universal utf/unicode: msg#00028

lang.perl.modules.lwp

Subject: How can I become universal utf/unicode

I don't know where else to post this question.

I'm already using LWP::UserAgent and HTML::Parser and successfully fetch and parse documents without problem. However, I would like to be universal. I'm using Perl 5.8.3 with the latest HTML::Parser as of today.

Sometimes when fetching a document you have no idea the encoding and sometimes you do. What I want to know is how do I convert the incoming Web page regardless of encoding to UTF-8 as well as encode entities to something like Aacute (for keyword matching)?

Maybe I'm stupid because I've tried everything I can think of as well as following some examples I've found and no matter what I do, it just doesn't work.

Any help would be appreciated.

Thanks,
John

_________________________________________________________________
Check out Election 2004 for up-to-date election news, plus voter tools and more! http://special.msn.com/msn/election2004.armx




<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise