Hi again,
first of all sorry for flooding the list today. This post will be the
worst (hopefully not too half-baked) brain-dump (guess my enthusiasm
struck me) and I promise to shut up for a while afterwards ;-).
OK, let's go. What else can we do to squeeze the performance tube?
An interesting property of a parse algorithm to look at could be,
whether it ever fails. Optional, Kleene and Epsilon never fail!
Algorithms can use that knowledge to save some code on a per-parser
basis and it seems quite easy to implement, too.
Another interesting idea is to apply "loop unrolling" to Kleene if many
matches are expected: Instead of
*a
we could write
*(a >> a >> a >> a >> a >> a >> a >> a)
>> !(a >> a >> a >> a)
>> !(a >> a)
>> ! a
and if "a" is a parser that doesn't have to save the iterator position
per se, we'd reduce iterator copying down to asymptotically 1/8th in
this case.
We could either use a fixed unrolling factor or even have automatic
recalibration at runtime, as described for Alternative in my previous post.
Binary operator*(kick_off_type,parser_type) could be used for syntactic
sugar, e.g:
dynamic_unroll(8) *a
Maybe there are some new ideas among it. Comments welcome, of course.
Regards,
Tobias
P.S.: Current version of algo pseudo code (repost), for further
meditation on the topic
V--- below ---V ...
------------------------------------------------------------------------
=============================================================================
Category "require iterator restore"
There is no benefit from non-saving component parsers. Automatic wrapping
can be used.
=============================================================================
kleene:
while (parser.parse(i,last)) ;
return true
- requires component parsers to restore iterator position on no match
- attaching a wrapper to enforce this requirement makes sense
=============================================================================
Category "propagate component restore"
A single algorithm propagates the behaviour of component parsers up the
descent.
=============================================================================
difference
if (rhs_parser.parse(copy(i),last))
return false
return (lhs_parser.parse(i,last))
- does not require component parsers to restore the iterator position on
no match
- restores iterator on no match if lhs_parser does
----- ---- --- -- -- - - -
intersection:
start = i
end = last
foreach (parser)
if (! parser.parse(i,last))
return false
end = min(end,i)
if (is_last(parser)) break
i = start
i = end
return true
- does not require component parsers to restore the iterator position on
no match
- restores iterator position on no match, if component parsers do
- iterator arithmetic has to be coded in terms of 'difference' (this
abstract code won't work because of possible singularity of 'last'
in particular)
- the foreach loop is not required to terminate, as the break statement
takes care of that (thus the if..break just means structure and no
overhead)
sidenote: the text of the last note is crap, since it's a compile time
loop, anyway.
----- ---- --- -- -- - - -
xor:
i_ = i
if (!a.parse(i_,last))
return b.parse(i,last)
else
result = !b.parse(i,last)
i = i_
return result
- does not require component parsers to restore the iterator position on
no match
- restores iterator position on no match if b does (commuatativity could
be exploited)
=============================================================================
Category "specialized versions"
A specialized algorithm, chosen based on the context, is needed for
maximum efficiency.
=============================================================================
sequence:
i_ = i
foreach (parser)
if (! parser.parse(i,last))
i = i_
return false
return true
- does not require component parsers to restore the iterator position on
no match
- restores iterator position on no match
--- -- -- - - -
sequence:
foreach (parser)
if (! parser.parse(i,last))
return false
return true
- does not require component parsers to restore the iterator position on
no match
- does not restore iterator position on no match
----- ---- --- -- -- - - -
alternative:
foreach (parser)
if (parser.parse(i,last))
return true
return false
- requires component parsers to restore iterator position on no match
- restores iterator position on no match
--- -- -- - - -
alternative:
i_ = i
foreach (parser)
if parser.parse(i,last)
return true
if (is_last(parser)) break
i = i_
return false
- does not require coponent parsers to restore iterator position on
no match
- does not restore iterator position on no match
- the foreach loop is not required to terminate, as the break statement
takes care of that (thus the if..break just means structure and no
overhead)
--- -- -- - - -
alternative:
i_ = i
foreach (parser)
if parser.parse(i,last)
return true
i = i_
return false
- does not require coponent parsers to restore iterator position on
no match
- restores iterator position on no match
----- ---- --- -- -- - - -
sequential_or:
start = i
end = i
foreach (parser)
if (parser.parse(i,last))
end = max(end,i)
i = start
i = end
return end != start
- requires component parsers to restore iterator position on no match
- restores iterator position on no match
- iterator arithmetic has to coded in terms of 'difference'
--- -- -- - - -
sequential_or:
start = i
end = i
foreach (parser)
if (parser.parse(i,last))
end = max(end,i)
if (is_last(parser)) break
i = start
i = end
return end != start
- does not require component parsers to restore the ierator position on
no match
- restores iterator position on no match
- iterator arithmetic has to be coded in terms of 'difference'
- the foreach loop is not required to terminate, as the break statement
takes care of that (thus the if..break just means structure and no
overhead)
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642