[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
Getting Unicode decode error using lxml.iterparse
dieter schrieb am 23.05.2018 um 08:25:
> If the encoding is not specified, "lxml" will try to determine it
> and finally defaults to "utf-8" (which seems to be the correct encoding
> for your case).
Being an XML parser, it does not do that. XML parsers are designed to
reject non-wellformed content, and that includes anything that cannot be
decoded.
In short, if no encoding is specified, then it's UTF-8, but if there is an
XML declaration that specifies that encoding, then it uses that encoding.
Here, the encoding is specifed as UTF-8, so that's what the parser uses.
Note, however, that the library that the OP uses is not lxml but xml.etree,
i.e. the ElementTree XML support in the standard library.
Stefan