logo       

further thoughts on adaptive parse algorithms: msg#00026

parsers.spirit.devel

Subject: further thoughts on adaptive parse algorithms

Hi again,

first of all sorry for flooding the list today. This post will be the worst (hopefully not too half-baked) brain-dump (guess my enthusiasm struck me) and I promise to shut up for a while afterwards ;-).

OK, let's go. What else can we do to squeeze the performance tube?

An interesting property of a parse algorithm to look at could be, whether it ever fails. Optional, Kleene and Epsilon never fail! Algorithms can use that knowledge to save some code on a per-parser basis and it seems quite easy to implement, too.

Another interesting idea is to apply "loop unrolling" to Kleene if many matches are expected: Instead of

*a

we could write

*(a >> a >> a >> a >> a >> a >> a >> a)
>> !(a >> a >> a >> a)
>> !(a >> a)
>> ! a

and if "a" is a parser that doesn't have to save the iterator position per se, we'd reduce iterator copying down to asymptotically 1/8th in this case.

We could either use a fixed unrolling factor or even have automatic recalibration at runtime, as described for Alternative in my previous post.

Binary operator*(kick_off_type,parser_type) could be used for syntactic sugar, e.g:

dynamic_unroll(8) *a

Maybe there are some new ideas among it. Comments welcome, of course.


Regards,

Tobias


P.S.: Current version of algo pseudo code (repost), for further meditation on the topic
V--- below ---V ...

------------------------------------------------------------------------


=============================================================================
Category "require iterator restore"

There is no benefit from non-saving component parsers. Automatic wrapping
can be used.
=============================================================================

kleene:
while (parser.parse(i,last)) ;
return true

- requires component parsers to restore iterator position on no match
- attaching a wrapper to enforce this requirement makes sense

=============================================================================
Category "propagate component restore"

A single algorithm propagates the behaviour of component parsers up the descent. =============================================================================

difference
if (rhs_parser.parse(copy(i),last))
return false
return (lhs_parser.parse(i,last))

- does not require component parsers to restore the iterator position on
no match
- restores iterator on no match if lhs_parser does

----- ---- --- -- -- - - -

intersection:
start = i
end = last
foreach (parser)
if (! parser.parse(i,last))
return false
end = min(end,i)
if (is_last(parser)) break i = start i = end
return true


- does not require component parsers to restore the iterator position on
no match
- restores iterator position on no match, if component parsers do
- iterator arithmetic has to be coded in terms of 'difference' (this abstract code won't work because of possible singularity of 'last' in particular)
- the foreach loop is not required to terminate, as the break statement
takes care of that (thus the if..break just means structure and no overhead)

sidenote: the text of the last note is crap, since it's a compile time loop, anyway.


----- ---- --- -- -- - - -

xor:
i_ = i
if (!a.parse(i_,last))
return b.parse(i,last)
else
result = !b.parse(i,last)
i = i_
return result
- does not require component parsers to restore the iterator position on
no match
- restores iterator position on no match if b does (commuatativity could
be exploited)

=============================================================================
Category "specialized versions"

A specialized algorithm, chosen based on the context, is needed for
maximum efficiency.
=============================================================================

sequence:
i_ = i foreach (parser)
if (! parser.parse(i,last))
i = i_ return false
return true

- does not require component parsers to restore the iterator position on
no match
- restores iterator position on no match

--- -- -- - - -

sequence:
foreach (parser)
if (! parser.parse(i,last))
return false
return true

- does not require component parsers to restore the iterator position on
no match
- does not restore iterator position on no match
----- ---- --- -- -- - - -

alternative:
foreach (parser)
if (parser.parse(i,last)) return true
return false

- requires component parsers to restore iterator position on no match
- restores iterator position on no match

--- -- -- - - -

alternative:
i_ = i foreach (parser)
if parser.parse(i,last) return true
if (is_last(parser)) break i = i_ return false

- does not require coponent parsers to restore iterator position on
no match
- does not restore iterator position on no match - the foreach loop is not required to terminate, as the break statement
takes care of that (thus the if..break just means structure and no overhead)

--- -- -- - - -

alternative:
i_ = i foreach (parser)
if parser.parse(i,last) return true
i = i_ return false

- does not require coponent parsers to restore iterator position on
no match
- restores iterator position on no match

----- ---- --- -- -- - - -

sequential_or:
start = i
end = i
foreach (parser)
if (parser.parse(i,last))
end = max(end,i)
i = start
i = end
return end != start

- requires component parsers to restore iterator position on no match
- restores iterator position on no match
- iterator arithmetic has to coded in terms of 'difference'

--- -- -- - - -

sequential_or:
start = i
end = i
foreach (parser)
if (parser.parse(i,last))
end = max(end,i)
if (is_last(parser)) break
i = start
i = end
return end != start

- does not require component parsers to restore the ierator position on
no match
- restores iterator position on no match - iterator arithmetic has to be coded in terms of 'difference'
- the foreach loop is not required to terminate, as the break statement
takes care of that (thus the if..break just means structure and no overhead)





-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise