osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lxml namespace as an attribute


Skip Montanaro schrieb am 15.08.2018 um 23:25:
> Much of XML makes no sense to me. Namespaces are one thing. If I'm
> parsing a document where namespaces are defined at the top level, then
> adding namespaces=root.nsmap works when calling the xpath method. I
> more-or-less get that.
> 
> What I don't understand is how I'm supposed to search for a tag when
> the namespace appears to be defined as an attribute of the tag itself.
> I have some SOAP XML I'm trying to parse. It looks roughly like this:
> 
> <s: Envelope xmlns:a="..." xmlns:s="...">
>   <s:Header>
>      ...
>   </s:Header>
>   <s:Body>
>     <Tag xmlns="http://some/new/path";>
>     ...
>     </Tag>
>   </s:Body>
> 
> If the document is "doc", I can find the body like so:
> 
> body = doc.xpath(".//Body" namespaces=doc.nsmap)
> 
> I don't understand how to find Tag, however. When I iterate over the
> body's children, printing them out, I see that Tag's name is actually:
> 
>     {http://some/new/path}Tag
> 
> yet that namespace is unknown to me until I find Tag. It seems I'm
> stuck in a chicken-and-egg situation. Without knowing that
> http://some/new/path namespace, is there a way to cleanly find all
> instances of Tag?

In addition to what dieter said, let me mention that you do not need to
obey to XPath's dictate to use namespace prefixes. lxml provides two ways
of expressing searches with qualified tag names (i.e. "{namespace}tag" aka.
Clark Notation).

1) You can use the .find*() methods, which implement a subset of what XPath
can express (the same that the xml.etree.ElementTree library supports,
improvements welcome), but are simpler to use and faster than the XPath
engine. If you need only the first occurrence of a tag, you can say

    doc.find(".//{http://some/namespace}Body";)

and there is also an .iterfind() method for incremental searches and
.findall() to return all matches as a list.

2) You can use the XPath subclass "ETXPath", which internally translates
qualified tag names to a prefix mapping for you and passes them on into the
normal XPath engine. So this gives you the expressiveness of XPath without
having to care about prefixes.

Stefan