logo       

tidy pseudo code for parse algorithms: msg#00007

parsers.spirit.devel

Subject: tidy pseudo code for parse algorithms

Joel de Guzman wrote:

Tobias, this is rather confusing. We are trying to coalesce 2 protocols
in one. It's difficult to analyze the pseudo code with the noise. Can we
instead have 2 sets instead of 1 combined set?

Well, as stated in my OP I was not trying to optimize things yet, but rather to
follow your discussion with Frank. So my confusing notation was probably
reflecting my own confusion ;-).

That said, you're right - it was about time to tidy up! Here are (per-function)
separated versions with different properties. Using two sets didn't really make
much sense (I'm sure you'll see why I used three categories when reading on).

Hope I didn't put in too many new mistakes...


Regards,

Tobias

=============================================================================
Category "require iterator restore"

There is no benefit from non-saving component parsers. Automatic wrapping
can be used.
=============================================================================

kleene:
while (parser.parse(i,last)) ;
return true

- requires component parsers to restore iterator position on no match
- attaching a wrapper to enforce this requirement makes sense

=============================================================================
Category "propagate component restore"

A single algorithm propagates the behaviour of component parsers up the
descent.
=============================================================================

difference
if (rhs_parser.parse(copy(i),last))
return false
return (lhs_parser.parse(i,last))

- does not require component parsers to restore the iterator position on
no match
- restores iterator on no match if lhs_parser does

----- ---- --- -- -- - - -

intersection:
start = i
end = last
foreach (parser)
if (! parser.parse(i,last))
return false
end = min(end,i)
if (is_last(parser)) break
i = start
i = end
return true


- does not require component parsers to restore the iterator position on
no match
- restores iterator position on no match, if component parsers do
- iterator arithmetic has to be coded in terms of 'difference' (this
abstract code won't work because of possible singularity of 'last'
in particular)
- the foreach loop is not required to terminate, as the break statement
takes care of that (thus the if..break just means structure and no
overhead)

----- ---- --- -- -- - - -

xor:
i_ = i
if (!a.parse(i_,last))
return b.parse(i,last)
else
result = !b.parse(i,last)
i = i_
return result

- does not require component parsers to restore the iterator position on
no match
- restores iterator position on no match if b does (commuatativity could
be exploited)

=============================================================================
Category "specialized versions"

A specialized algorithm, chosen based on the context, is needed for
maximum efficiency.
=============================================================================

sequence:
i_ = i
foreach (parser)
if (! parser.parse(i,last))
i = i_
return false
return true

- does not require component parsers to restore the iterator position on
no match
- restores iterator position on no match

--- -- -- - - -

sequence:
foreach (parser)
if (! parser.parse(i,last))
return false
return true

- does not require component parsers to restore the iterator position on
no match
- does not restore iterator position on no match

----- ---- --- -- -- - - -

alternative:
foreach (parser)
if (parser.parse(i,last))
return true
return false

- requires component parsers to restore iterator position on no match
- restores iterator position on no match

--- -- -- - - -

alternative:
i_ = i
foreach (parser)
if parser.parse(i,last)
return true
if (is_last(parser)) break
i = i_
return false

- does not require coponent parsers to restore iterator position on
no match
- does not restore iterator position on no match
- the foreach loop is not required to terminate, as the break statement
takes care of that (thus the if..break just means structure and no
overhead)

--- -- -- - - -

alternative:
i_ = i
foreach (parser)
if parser.parse(i,last)
return true
i = i_
return false

- does not require coponent parsers to restore iterator position on
no match
- restores iterator position on no match

----- ---- --- -- -- - - -

sequential_or:
start = i
end = i
foreach (parser)
if (parser.parse(i,last))
end = max(end,i)
i = start
i = end
return end != start

- requires component parsers to restore iterator position on no match
- restores iterator position on no match
- iterator arithmetic has to coded in terms of 'difference'

--- -- -- - - -

sequential_or:
start = i
end = i
foreach (parser)
if (parser.parse(i,last))
end = max(end,i)
if (is_last(parser)) break
i = start
i = end
return end != start

- does not require component parsers to restore the ierator position on
no match
- restores iterator position on no match
- iterator arithmetic has to be coded in terms of 'difference'
- the foreach loop is not required to terminate, as the break statement
takes care of that (thus the if..break just means structure and no
overhead)

<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise