logo       


Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...

[PATH-V3] XIST-like namespace implementations for lxml: msg#00008

Subject: [PATH-V3] XIST-like namespace implementations for lxml
Ok, third try.

This is an updated patch that contains fixes for bugs in the previous
versions. The previous implementation could seg-fault if the namespace of a
new element was not set.

I also changed the name of the registration functions to
(un)register_namespace_classes
I know, that is even longer than before, but the names make it clear what they
are good for and in common use cases, they will only be called once per
application, so readability tops lazyness here.

I didn't run extensive tests on the patched installation, but the test suite
seems to be broken anyway (at least, test_etree.py doesn't set up the etree
attribute of the test class). I do, however, think that the biggest bugs
should be gone by now. I'd like to see the patch applied for the next release.

Hoping for approval,
Stefan


Stefan Behnel wrote:
> I like the idea behind lxml a lot: using the most complete XML library
> available and making it a pythonic tool to use.
> 
> I do, however, also know the tool XIST, that is (in my eyes) the most pythonic
> XML API I can imagine. See here for examples how to use it:
> http://www.livinglogic.de/Python/xist/Examples.html
> 
> While a number of features should conflict with the ElementTree approach,
> there are a lot of ideas in XIST that should accompany lxml just great,
> especially with the large number of supported XML standards (which XIST
> lacks). With this proposal, I'm refering to the implementation of namespaces,
> as shown in "Defining new elements and converting XML trees" at
> http://www.livinglogic.de/Python/xist/Howto.html
> and implemented in modules described at
> http://www.livinglogic.de/Python/xist/ns/index.html
> 
> It has a boundless number of use cases. It simply allows you to implement an
> arbitraryly complex, domain specific API on top of the XML API, to use custom
> data bindings, to implement configurable GUIs in XML (go, implement the XUL
> namespace), and a lot of other things you could (or cannot yet) imagine.
> 
> What has to be done? I imagine the following implementation in lxml.
> 
> * Make _Element available at the API level as ElementImpl (or ElementProxy?).
> 
> * Add a module-level dictionary for registering namespaces.
> 
> * Add two new module-level methods
> 
> def register_namespace(namespace_uri, class_dict):
>     """Register a new namespace in the global implementation dictionary and
>     assign a dictionary to it that maps element local names to classes
>     implementing them. Classes /must/ inherit from ElementImpl."""
> 
> def unregister_namespace(namespace_uri):
>     "Unregister the namespace implementation."
> 
> * Modify either "lxml.etree._elementFactory" or "lxml.etree.getProxy" (sorry,
> I didn't get so deep into the code) to lookup the namespace of the newly
> created element in the global dictionary. If that fails, use the original
> behaviour of creating a default proxy. If it is found, however, then use the
> dictionary assigned to this namespace to lookup the class corresponding to the
> local name of the element. If it is found, instantiate it, otherwise lookup
> "None" to find a default implementation for the elements in this namespace. If
> that isn't found either, fall back to the default proxy.
> 
> I think this is a great extension to the ElementTree API. It should not effect
> the behaviour (or performance) if the feature is /not/ used, but it allows a
> great amount of flexibility at a very low overhead /if/ it is used. Note that
> the object is created at each access through the XML API, so it cannot hold
> any state. All state *must* be submitted to the underlying XML data. Element
> subclasses are purely representational, but they can provide any API you could
> imagine.
> 
> The proposed user API is extremely simple. You can implement a module of
> element classes and then simply run "register_namespace(uri, vars())" at the
> end. Importing that module will then register the namespace for you and make
> the classes turn up at the API level. For ease of use, "register_namespace"
> should be forgiving and create a new class dictionary (which it hase to do
> anyway, for safety) by extracting only the values that actually subclass
> ElementImpl, throwing everything else away. That should simplify (or allow)
> the usage of "vars()". You could achieve the same with a 1-line generator in
> the call to register_namespace - but why bug the user with it? I think it's a
> common enough usage.
> 
> So, there is still some place for discussion. I personally believe that it
> should be enough to register Element implementations. ElementTree can be
> subclassed directly and other parts of the XML API are less interesting for
> this case. If (in the future) new use cases turn up that make other XML
> classes interesting for subclassing, register_namespace can simply extend the
> class selection to more than ElementImpl subclasses. But I currently do not
> see a need for that.
> 
> I hope you like the idea as much as I do. Since it does not look like a lot of
> modifications to me, it would be great to see this implemented in an upcoming
> release.
Index: src/lxml/etree.pyx
===================================================================
--- src/lxml/etree.pyx  (Revision 18516)
+++ src/lxml/etree.pyx  (Arbeitskopie)
@@ -155,6 +155,7 @@
         return c_ns
 
 cdef class _ElementTree(_DocumentBase):
+    cdef object _namespace_classes
 
     def parse(self, source, parser=None):
         """Updates self with the content of source and returns its root
@@ -339,12 +340,22 @@
             file = open(file, 'wb')
         file.write(data)
         tree.xmlFree(data)
-    
+
+    def register_namespace_classes(self, namespace, class_dict):
+        ns_utf = namespace.encode('UTF-8')
+        self._namespace_classes[ns_utf] = 
_build_namespace_impl_dict(class_dict)
+
+    def unregister_namespace_classes(self, namespace, class_dict):
+        ns_utf = namespace.encode('UTF-8')
+        del self._namespace_classes[ns_utf]
+
+
 cdef _ElementTree _elementTreeFactory(xmlDoc* c_doc):
     cdef _ElementTree result
     result = _ElementTree()
     result._ns_counter = 0
     result._c_doc = c_doc
+    result._namespace_classes = {}
     return result
 
 cdef class _Element(_NodeBase):
@@ -673,13 +684,19 @@
 
 cdef _Element _elementFactory(_ElementTree etree, xmlNode* c_node):
     cdef _Element result
+    cdef char* c_ns_href
     result = getProxy(c_node, PROXY_ELEMENT)
     if result is not None:
         return result
     if c_node is NULL:
         return None
     if c_node.type == tree.XML_ELEMENT_NODE:
-        result = _Element()
+        if c_node.ns == NULL:
+            c_ns_href = NULL
+        else:
+            c_ns_href = c_node.ns.href
+        element_class = _find_element_class(etree, c_ns_href, c_node.name)
+        result = element_class()
     elif c_node.type == tree.XML_COMMENT_NODE:
         result = _Comment()
     else:
@@ -959,6 +976,21 @@
     c_node = tree.xmlNewDocComment(c_doc, text)
     return c_node
 
+
+# module-level API for namespace implementations
+
+class ElementImpl(_Element):
+    pass
+
+def register_namespace_classes(namespace, class_dict):
+    ns_utf = namespace.encode('UTF-8')
+    __NAMESPACE_CLASSES[ns_utf] = _build_namespace_impl_dict(class_dict)
+
+def unregister_namespace_classes(namespace):
+    ns_utf = namespace.encode('UTF-8')
+    del __NAMESPACE_CLASSES[ns_utf]
+
+
 # module-level API for ElementTree
 
 def Element(tag, attrib=None, nsmap=None, **extra):
@@ -1644,6 +1676,38 @@
 theParser = Parser()
 
 # Private helper functions
+__NAMESPACE_CLASSES = {}
+
+cdef object _find_element_class(_ElementTree etree, char* c_namespace_utf, 
char* c_element_name_utf):
+    element_name_utf = c_element_name_utf
+    if c_namespace_utf == NULL:
+        namespace_utf = None
+    else:
+        namespace_utf = c_namespace_utf
+
+    for namespace_dict in (etree._namespace_classes, __NAMESPACE_CLASSES):
+        try:
+            class_dict = namespace_dict[namespace_utf]
+        except KeyError:
+            continue
+        try:
+            return class_dict[element_name_utf]
+        except KeyError:
+            pass
+        try:
+            return class_dict[None]
+        except KeyError:
+            break # do not try the other dict, we might mix different 
implementations!
+    return _Element
+
+cdef object _build_namespace_impl_dict(class_dict):
+    d = {}
+    for name, cls in class_dict.iteritems():
+        if issubclass(cls, _Element):
+            name_utf = name.encode('UTF-8')
+            d[name_utf] = cls
+    return d
+
 cdef _dumpToFile(f, xmlDoc* c_doc, xmlNode* c_node):
     cdef tree.PyObject* o
     cdef tree.xmlOutputBuffer* c_buffer
import sys
sys.path.insert(0, 'build/lib.linux-i686-2.4')

from lxml.etree import parse, ElementImpl, register_namespace_classes, 
unregister_namespace_classes

class test1(ElementImpl):
    def tryme(self):
        return 1

class test2(ElementImpl):
    def tryme(self):
        return 2

NS1=u"testNS"
NS2=u"huhu"

register_namespace_classes(NS1, {u'test-me':test1}) # NS1, test1
register_namespace_classes(NS2, {u'test-me':test2}) # NS2, test2


from StringIO import StringIO
f = StringIO("<bla xmlns='%s'><test-me/><a:test-me 
xmlns:a='%s'/><test-me/></bla>" % (NS1,NS2))

doc1 = parse(f)
doc2 = parse(f)

doc2.register_namespace_classes(NS2, {u'test-me':test1}) # note: NS2, test1 -> 
1 1 1 instead of 1 1 2

for doc in (doc1, doc2):
    for ns in (NS1, NS2):
        el = doc.xpath('//t:test-me', {'t':ns})

        for child in el:
            print child.tryme(),
    print

unregister_namespace_classes(NS2) # removes namespace only from module (i.e. 
not from doc2!)

for doc in (doc1, doc2):
    for ns in (NS1, NS2):
        el = doc.xpath('//t:test-me', {'t':ns})

        for child in el:
            try:
                print child.tryme(),
            except AttributeError:
                print
                print "Element '%s' has no 'tryme' attribute." % child.tag
    print
_______________________________________________
lxml-dev mailing list
lxml-dev@xxxxxxxxxxxxx
http://codespeak.net/mailman/listinfo/lxml-dev
Ruby Jobs
Java Jobs
Jobs in California
more...
what
job title, keywords
where
city, state, zip
jobs by job search
<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
db.firebase.por...    text.xml.xalan....    qnx.openqnx.dev...    user-groups.zar...    internationaliz...    kde.devel.konve...    finance.e-gold....    emacs.latex.pre...    gis.therion/200...    web.webmin.gene...    yellowdog.gener...    vserver/2003-08...    redhat.release....    sysutils.tivoli...    xfree86.expert/...    mail.becky.user...    hardware.netapp...    netbsd.ports.xe...    python.distutil...    boot-loaders.gr...    culture.interne...    java.springfram...    activedir/2006-...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe