logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

Request for comments: Removing lxml.etree's default support for namespace c: msg#00057

Subject: Request for comments: Removing lxml.etree's default support for namespace class support
Hi all,

I know, breaking compatibility is a serious topic, so I'm putting this here
for an open discussion. This change would only impact code that uses the
namespace class lookup to supply custom element classes to lxml.etree. Other
code would continue to work.

Currently, lxml.etree does namespace lookup for custom element classes by
default. This has been the case in the 0.9 and 1.0 series.

Starting with lxml 1.1, etree will support not only custom classes, but also
custom lookup schemes for these classes. It includes a generic fallback
mechanism from one lookup scheme to another if the first one fails. This means
that the default support for namespace class lookup is becoming redundant, as
it is also supported by a public class that provides the namespace lookup
scheme. Also, the current scheme does not support a fallback other than the
default element class, so code that wants to use the namespace lookup with a
different fallback is still required to re-register both.

To remove this redundancy, to speed up the default setup if namespace classes
are /not/ used and (above all) to make the lookup API more accessible, I would
like to remove the default for namespace lookup and replace it by the simplest
possible mechanism that always returns the normal element classes. If
namespace lookup support is needed, something like the following code would be
required at setup time:

    from lxml import etree
    try:
        lookup = etree.ElementNamespaceClassLookup()
    except AttributeError:
        # lxml >= 0.9 and < 1.1 supports this by default
        pass
    else:
        # lxml >= 1.1 requires an explicit setup
        etree.setElementClassLookup(lookup)

This code block is backwards compatible with lxml 0.9 and lxml 1.0, so new
code that requires namespace class lookup could continue to support lxml from
version 0.9 on, while older code that uses namespace classes would have to be
updated with the above code block to support lxml 1.1 and later. Doing this
switch *now* makes the above code pretty short, later changes would require
version checking and the like.

One of the main reasons for this change is that I would like to make the
lookup mechanism explict and visible. It is a global property that impacts the
entire library. Users who do not need to install their own custom classes
should not be bothered with it, i.e. should be able to ignore the lookup API,
the Namespace class registry, etc. For those who need a different mechanism, I
believe that the current default does not make it visible enough that (for
example) the functionality of the "Namespace" class registry is disabled if
you select a different class lookup mechanism.

So the new custom class support would work like this:

  * if no custom classes are used, no configuration is needed
  * any support for custom classes requires setting up a lookup scheme
  * changing the default class is done by creating and setting a default
    lookup scheme based on the new default classes
  * using the namespace lookup requires setting the ns lookup scheme, which
    then enables lookups based on the global Namespace registry
  * setting a per-parser lookup scheme enables delegation to the specific
    lookup registered with a parser, which in turn can deploy any of the
    available schemes and defaults to using the normal classes

I'm also considering to replicate the Namespace registry locally in the
ElementNamespaceClassLookup class. This would allow things like a per-parser
namespace registry and the like. I think removing the default would also help
in getting this cleaner.

I'm really interested in hearing opinions on this. I think the above
compatibility code makes the switch trivial to do, but I would like to hear if
there are other impacts of this change that I might not have thought of.

Stefan


<Prev in Thread] Current Thread [Next in Thread>