HTML-Parser-3.41 is available from CPAN. The major news is that
HTML::Parser should now do the right thing with Unicode strings and
that the compile time option to enable Unicode entities is gone.
There is a new 'utf8_mode' that allow saner parsing of raw undecoded
UTF-8. The Unicode support is only available if you use perl-5.8 or
better.
Other noteworthy recent changes:
- <title> content parsed in literal mode
- <script> and <style> skip quoted strings when looking for
matching end tag
- if no matching end tag is found for <script>, <style>, <xmp>
<title>, <textarea> then generate one where the next tag
starts.
- will decode unterminated entities in 'dtext', i.e. foo bar
become "foo bar".
Enjoy!
|
Try Searching:
servers, voip, java, networking, microsoft ...
|
|
|
|