|
|
Re: POLL: Fwd: Re: XPath filter 2.0: msg#00068
ietf.xmldsig
|
Subject: |
Re: POLL: Fwd: Re: XPath filter 2.0 |
Hi,
1: I would like to use the way Merlin described in his last mail (the XPath
expressions select nodes in the input DOCUMENT, but we can only "unionize"
nodes which have already been in the input NODESET). If this transform type
comes as a first transform, not many things will change. I think it's a
good way that a transform cannot re-include nodes which have been omitted
by previous ones. So I vote: yes (option 2)
2: I think it's an absolute must-have that multiple
union/subtract/intersect 'steps' can be combined in a single ds:Transform.
If we force that only one single step can be placed into a single
ds:Transform, it would make very ugly tweaks necessary in my software if I
want to optimize multiple steps in a single transformation. This grouping
should be possible regardless of how the working group descides on the
above issue. So I vote: a strong yes
3: I did not really understood the algorithm Merlin described, but if he
sees a way how implementors can make some "tree-labeling" and only a single
traversal over the tree, this seems very cool.
Regards,
Christian
--On Montag, 10. Juni 2002 17:16 -0400 Joseph Reagle <reagle@xxxxxx> wrote:
I'd like to move forward to a Last Call for this specification ASAP. Last
week Merlin outlined some options [1], one of which being a new proposal
that he gave some more details about [2] in response to Christian.
So to conclude the thread I'd like to know what people prefer:
1. Move forward with what we have presently [3].
2. Specify Merlin's proposal (The timing of this will be
dependent on properly representing the proposal (e.g., Merlin's time
<smile/>) and then reviewing and iterating on it a few times to make sure
we have it clear.)
Please respond before Wednesday. I'm favoring option #1 unless there's a
couple of folks advocating/willing-to-write #2 without a strong dissent.
[1] http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2002AprJun/0288
[2] http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2002AprJun/0291
[3] http://www.w3.org/TR/2002/WD-xmldsig-filter2-20020425/
---------- Forwarded Message ----------
Subject: Re: XPath filter 2.0
Date: Thu, 06 Jun 2002 12:22:47 +0100
From: merlin <merlin@xxxxxxxxxxxx>
To: Christian Geuer-Pollmann <geuer-pollmann@xxxxxxxxxxxxxxxxxxxxxxxx>
Cc: w3c-ietf-xmldsig@xxxxxx
r/geuer-pollmann@xxxxxxxxxxxxxxxxxxxxxxxx/2002.06.06/10:29:55
* Implementations SHOULD note that an efficient realization
of this transform will not compute a node set at each
point in this transform, but instead make a list of
the operations and the XPath-selected nodes, and then
iterate through the input document once, in document
order, constructing a filtering node set N based upon
the operations and selected nodes.
.......
. Efficiency is as with the current spec. Basically this
fixes union.
Does this mean that based on the operations, you make some tree-labeling
;-)) and then you make one tree-traversal to output the selected nodes?
Yes, but this is the exact algorithm that implementations
of the _current_ XPath Filter 2.0 transform _should_ use
if a sequence of the current XPath Filter 2.0 transforms
precedes c14n. This formulation of the filter simply makes
it easier to express in terms of SHOULD language.
Sounds cool, in that case, the efficiency would be much better then the
current results of v2.0. What kind of algorithm do you use for
- make a list of operations and selected nodes,
- decide based on this data which nodes are the result.
If you're asking for a normative algorithm for the general
case, then I would have to think for a while. Restricting
myself somewhat:
First, let's characterize a sequence of filters:
Filter ::= (INTERSECT | SUBTRACT | UNION)+
You will observe that adjacent SUBTRACT and INTERSECT
operations can be idempotently reordered, and that
adjacent operations of the same type can be computationally
merged. However, I would expect that type of thing to be
done in the XPath expressions so I will ignore this.
Restricting myself to the following production, which captures
all _common_ (in my opinion) use cases:
SimpleFilter ::= A:UNION* (B:INTERSECT | C:SUBTRACT)* D:UNION*
Iterate over the document.
(define (include-node) ;; whether or not to include a node
(cond ;; returns the first match
((encountered-any D) t)
;; if you've encountered a node in a trailing union
((encountered-any C) nil)
;; not if you've encountered a subtraction node
((null B) t)
;; if there are no intersect operations
((encountered-all B) t)
;; if you've encountered a node in each intersect node set
(t nil))) ;; not otherwise
Note that encountered-any returns nil if its parameter is an
empty list of node sets, but encountered-all is true in this
case. We can therefore express this concisely:
(define (include-node)
(or (encountered-any D)
(and (not (encountered-any C)) (encountered-all B))))
You can implement encountered-foo and therefore include-node
strictly in terms of node labeling, a stack and iteration.
Obviously you must also consider the input node set.
Going from this to a fully general solution is fairly
straightforward. Observe that UNIONs are subject to ALL
subsequent INTERSECT and SUBTRACT operations, but no preceding
ones, and that the entire filter is equivalent to:
UNION/ Filter
I can then say that a node is included if I have encountered
a node in ANY ( UNION operation AND NOT ANY SUBSEQUENT SUBTRACT
operation AND ALL SUBSEQUENT INTERSECT operations ).
Work done to compute this is proportional to the number of
filters but only done at a labeled node.
Merlin
Christian
--On Donnerstag, 6. Juni 2002 01:09 +0100 merlin <merlin@xxxxxxxxxxxx>
wrote:
Hi,
Quick summary of options:
1. Current Spec
. This is intuitive (in my opinion) because it is based on a
linear sequence of set operations.
. Typical (IMHO) use cases require 2 XPath evaluations.
However, increasingly complex filtering requirements incur
increasing cost; an arbitrarily complex expression requires
an arbitrarily large number of simple XPath expressions.
However, the standard XPath filter may be more useful for
these anyway.
. Operation can, in most cases, be commingled with c14n for
efficiency, but:
. The union operator is really ugly and unintuitive.
2. Christian's Spec
. *I* do not believe this is as intuitive; it involves labeling
nodes and then traversing the document, proceeding based
on node labels (e.g., omit-but-traverse).
. Typical use cases require 2 XPath evaluations. Increasingly
complex filtering requirements can be solved in a fixed
number (2/3) of increasingly complex XPath expressions.
. Operation can be commingled with c14n for effiency.
3. Or, we can take a variant of the current spec. I won't
detail it horrendously, but basically:
. The XPath Filter 2.1 takes, as a parameter, a sequence
of operations, each of which is characterized as a
set operation (intersect, subtract, union) and an
XPath expression.
. Operation over an input node set is as follows:
* Construct a node set N consisting of all the
nodes in the input document.
* Iterate through each of the operations.
# Evaluate the XPath expression; the result is X.
# Expand all identified nodes to include their
subtrees; the result is Y.
# Assign N = N op Y
* Use the resulting node set N as a filter to select
which nodes from the input node set will remain in the
output node set, just as the XPath 1.0 filter. This is
tantamount to intersection with the input node set.
* Implementations SHOULD note that an efficient realization
of this transform will not compute a node set at each
point in this transform, but instead make a list of
the operations and the XPath-selected nodes, and then
iterate through the input document once, in document
order, constructing a filtering node set N based upon
the operations and selected nodes.
* Implementations SHOULD note that iterating through the
document and constructing a filtering node set N can
be efficiently commingled with the canonicalization
transform if canonicalization is performed immediately
after this transform.
. With this formulation, intersection and subtraction
are IDENTICAL to the existing spec, with the only
change being that you can put them in one transform
or many.
. Union is, however, much improved (in my opinion). You
can only use it to include nodes that would be
removed by a previous operation in the same transform.
As a result, the output node set will only include
nodes from the input node set.
. Efficiency is as with the current spec. Basically this
fixes union.
I write this a while ago; thought I'd send it rather
than delete it. It's probably wasteful to propose yet
another option.
Merlin
-------------------------------------------------------
--
Joseph Reagle Jr. http://www.w3.org/People/Reagle/
W3C Policy Analyst mailto:reagle@xxxxxx
IETF/W3C XML-Signature Co-Chair http://www.w3.org/Signature/
W3C XML Encryption Chair http://www.w3.org/Encryption/2001/
Only community members can participate in forum threads. You must Register or log in to contribute.
|
|