Ok, third try.
This is an updated patch that contains fixes for bugs in the previous
versions. The previous implementation could seg-fault if the namespace of a
new element was not set.
I also changed the name of the registration functions to
(un)register_namespace_classes
I know, that is even longer than before, but the names make it clear what they
are good for and in common use cases, they will only be called once per
application, so readability tops lazyness here.
I didn't run extensive tests on the patched installation, but the test suite
seems to be broken anyway (at least, test_etree.py doesn't set up the etree
attribute of the test class). I do, however, think that the biggest bugs
should be gone by now. I'd like to see the patch applied for the next release.
Hoping for approval,
Stefan
Stefan Behnel wrote:
> I like the idea behind lxml a lot: using the most complete XML library
> available and making it a pythonic tool to use.
>
> I do, however, also know the tool XIST, that is (in my eyes) the most pythonic
> XML API I can imagine. See here for examples how to use it:
> http://www.livinglogic.de/Python/xist/Examples.html
>
> While a number of features should conflict with the ElementTree approach,
> there are a lot of ideas in XIST that should accompany lxml just great,
> especially with the large number of supported XML standards (which XIST
> lacks). With this proposal, I'm refering to the implementation of namespaces,
> as shown in "Defining new elements and converting XML trees" at
> http://www.livinglogic.de/Python/xist/Howto.html
> and implemented in modules described at
> http://www.livinglogic.de/Python/xist/ns/index.html
>
> It has a boundless number of use cases. It simply allows you to implement an
> arbitraryly complex, domain specific API on top of the XML API, to use custom
> data bindings, to implement configurable GUIs in XML (go, implement the XUL
> namespace), and a lot of other things you could (or cannot yet) imagine.
>
> What has to be done? I imagine the following implementation in lxml.
>
> * Make _Element available at the API level as ElementImpl (or ElementProxy?).
>
> * Add a module-level dictionary for registering namespaces.
>
> * Add two new module-level methods
>
> def register_namespace(namespace_uri, class_dict):
> """Register a new namespace in the global implementation dictionary and
> assign a dictionary to it that maps element local names to classes
> implementing them. Classes /must/ inherit from ElementImpl."""
>
> def unregister_namespace(namespace_uri):
> "Unregister the namespace implementation."
>
> * Modify either "lxml.etree._elementFactory" or "lxml.etree.getProxy" (sorry,
> I didn't get so deep into the code) to lookup the namespace of the newly
> created element in the global dictionary. If that fails, use the original
> behaviour of creating a default proxy. If it is found, however, then use the
> dictionary assigned to this namespace to lookup the class corresponding to the
> local name of the element. If it is found, instantiate it, otherwise lookup
> "None" to find a default implementation for the elements in this namespace. If
> that isn't found either, fall back to the default proxy.
>
> I think this is a great extension to the ElementTree API. It should not effect
> the behaviour (or performance) if the feature is /not/ used, but it allows a
> great amount of flexibility at a very low overhead /if/ it is used. Note that
> the object is created at each access through the XML API, so it cannot hold
> any state. All state *must* be submitted to the underlying XML data. Element
> subclasses are purely representational, but they can provide any API you could
> imagine.
>
> The proposed user API is extremely simple. You can implement a module of
> element classes and then simply run "register_namespace(uri, vars())" at the
> end. Importing that module will then register the namespace for you and make
> the classes turn up at the API level. For ease of use, "register_namespace"
> should be forgiving and create a new class dictionary (which it hase to do
> anyway, for safety) by extracting only the values that actually subclass
> ElementImpl, throwing everything else away. That should simplify (or allow)
> the usage of "vars()". You could achieve the same with a 1-line generator in
> the call to register_namespace - but why bug the user with it? I think it's a
> common enough usage.
>
> So, there is still some place for discussion. I personally believe that it
> should be enough to register Element implementations. ElementTree can be
> subclassed directly and other parts of the XML API are less interesting for
> this case. If (in the future) new use cases turn up that make other XML
> classes interesting for subclassing, register_namespace can simply extend the
> class selection to more than ElementImpl subclasses. But I currently do not
> see a need for that.
>
> I hope you like the idea as much as I do. Since it does not look like a lot of
> modifications to me, it would be great to see this implemented in an upcoming
> release.
Index: src/lxml/etree.pyx
===================================================================
--- src/lxml/etree.pyx (Revision 18516)
+++ src/lxml/etree.pyx (Arbeitskopie)
@@ -155,6 +155,7 @@
return c_ns
cdef class _ElementTree(_DocumentBase):
+ cdef object _namespace_classes
def parse(self, source, parser=None):
"""Updates self with the content of source and returns its root
@@ -339,12 +340,22 @@
file = open(file, 'wb')
file.write(data)
tree.xmlFree(data)
-
+
+ def register_namespace_classes(self, namespace, class_dict):
+ ns_utf = namespace.encode('UTF-8')
+ self._namespace_classes[ns_utf] =
_build_namespace_impl_dict(class_dict)
+
+ def unregister_namespace_classes(self, namespace, class_dict):
+ ns_utf = namespace.encode('UTF-8')
+ del self._namespace_classes[ns_utf]
+
+
cdef _ElementTree _elementTreeFactory(xmlDoc* c_doc):
cdef _ElementTree result
result = _ElementTree()
result._ns_counter = 0
result._c_doc = c_doc
+ result._namespace_classes = {}
return result
cdef class _Element(_NodeBase):
@@ -673,13 +684,19 @@
cdef _Element _elementFactory(_ElementTree etree, xmlNode* c_node):
cdef _Element result
+ cdef char* c_ns_href
result = getProxy(c_node, PROXY_ELEMENT)
if result is not None:
return result
if c_node is NULL:
return None
if c_node.type == tree.XML_ELEMENT_NODE:
- result = _Element()
+ if c_node.ns == NULL:
+ c_ns_href = NULL
+ else:
+ c_ns_href = c_node.ns.href
+ element_class = _find_element_class(etree, c_ns_href, c_node.name)
+ result = element_class()
elif c_node.type == tree.XML_COMMENT_NODE:
result = _Comment()
else:
@@ -959,6 +976,21 @@
c_node = tree.xmlNewDocComment(c_doc, text)
return c_node
+
+# module-level API for namespace implementations
+
+class ElementImpl(_Element):
+ pass
+
+def register_namespace_classes(namespace, class_dict):
+ ns_utf = namespace.encode('UTF-8')
+ __NAMESPACE_CLASSES[ns_utf] = _build_namespace_impl_dict(class_dict)
+
+def unregister_namespace_classes(namespace):
+ ns_utf = namespace.encode('UTF-8')
+ del __NAMESPACE_CLASSES[ns_utf]
+
+
# module-level API for ElementTree
def Element(tag, attrib=None, nsmap=None, **extra):
@@ -1644,6 +1676,38 @@
theParser = Parser()
# Private helper functions
+__NAMESPACE_CLASSES = {}
+
+cdef object _find_element_class(_ElementTree etree, char* c_namespace_utf,
char* c_element_name_utf):
+ element_name_utf = c_element_name_utf
+ if c_namespace_utf == NULL:
+ namespace_utf = None
+ else:
+ namespace_utf = c_namespace_utf
+
+ for namespace_dict in (etree._namespace_classes, __NAMESPACE_CLASSES):
+ try:
+ class_dict = namespace_dict[namespace_utf]
+ except KeyError:
+ continue
+ try:
+ return class_dict[element_name_utf]
+ except KeyError:
+ pass
+ try:
+ return class_dict[None]
+ except KeyError:
+ break # do not try the other dict, we might mix different
implementations!
+ return _Element
+
+cdef object _build_namespace_impl_dict(class_dict):
+ d = {}
+ for name, cls in class_dict.iteritems():
+ if issubclass(cls, _Element):
+ name_utf = name.encode('UTF-8')
+ d[name_utf] = cls
+ return d
+
cdef _dumpToFile(f, xmlDoc* c_doc, xmlNode* c_node):
cdef tree.PyObject* o
cdef tree.xmlOutputBuffer* c_buffer
import sys
sys.path.insert(0, 'build/lib.linux-i686-2.4')
from lxml.etree import parse, ElementImpl, register_namespace_classes,
unregister_namespace_classes
class test1(ElementImpl):
def tryme(self):
return 1
class test2(ElementImpl):
def tryme(self):
return 2
NS1=u"testNS"
NS2=u"huhu"
register_namespace_classes(NS1, {u'test-me':test1}) # NS1, test1
register_namespace_classes(NS2, {u'test-me':test2}) # NS2, test2
from StringIO import StringIO
f = StringIO("<bla xmlns='%s'><test-me/><a:test-me
xmlns:a='%s'/><test-me/></bla>" % (NS1,NS2))
doc1 = parse(f)
doc2 = parse(f)
doc2.register_namespace_classes(NS2, {u'test-me':test1}) # note: NS2, test1 ->
1 1 1 instead of 1 1 2
for doc in (doc1, doc2):
for ns in (NS1, NS2):
el = doc.xpath('//t:test-me', {'t':ns})
for child in el:
print child.tryme(),
print
unregister_namespace_classes(NS2) # removes namespace only from module (i.e.
not from doc2!)
for doc in (doc1, doc2):
for ns in (NS1, NS2):
el = doc.xpath('//t:test-me', {'t':ns})
for child in el:
try:
print child.tryme(),
except AttributeError:
print
print "Element '%s' has no 'tryme' attribute." % child.tag
print
_______________________________________________
lxml-dev mailing list
lxml-dev@xxxxxxxxxxxxx
http://codespeak.net/mailman/listinfo/lxml-dev
|