logo       

Re: Fusion-ifying proto parse trees: msg#00031

parsers.spirit.devel

Subject: Re: Fusion-ifying proto parse trees


Eric Niebler wrote:

Joel de Guzman wrote:

Eric Niebler wrote:

The segmented algorithm looks complicated, but fortunately, I think all the algorithms will follow a similar pattern, so we should be able to crank them out without too much fuss. It is perfectly general and will work with any segmented data structure, not just binary trees, assuming you give it suitably defined iterators.

I have no performance numbers, but I have a good feeling about it. :-)

Thanks! I'm perusing the code now. This would be a good addition
to Fusion. I wonder how algorithms that return views would look
like, or if they are as efficient as suggested.

A view over a segmented sequence is itself a segmented sequence and would need segmented iterators. It would probably be best at this point to come up with some perf numbers to justify further work in this area.

I've done a little perf testing of the various approaches we have for traversing parse trees. I've attached the test.

Three traversals are tested. First, traverse a tree that's been pre-flattened into a fusion list. That's what Spirit-2 currently does, and is the baseline. Second, using non-segmented iterators to traverse a proto parse tree. Third, using segmented iterators.

The test is run twice, first with a small number of elements (4), and next with a larger number (24). Here are the results:

The test was run with VC7.1 in Release mode with aggressive inlining turned on.

Short test ...
Fusion list time : 5.14835e-009
Non-segmented proto time : 5.14835e-009
Segmented proto time : 6.78748e-009

Long test ...
Fusion list time : 1.24216e-007
Non-segmented proto time : 7.16209e-007
Segmented proto time : 4.4167e-008

For shorter trees, the segmented implementation is a clear loser at 30% slower, and surprisingly, the non-segmented implementation does just as well as using a pre-flattened tree.

For larger parse trees, the results are jaw-dropping. Segmented proto is nearly 3 TIMES FASTER than even the pre-flattened tree, and 16 times faster than non-segmented proto.

I don't know what's going on, but if I had to guess, I'd say inlining depth is key to making sense of these results. By bifurcating the recursion, the total recursion depth decreases, which probably brings it under some compiler threshold. But I'm just speculating.

It also suggests that a way to improve performance further is to eliminate unnecessary calls, even if you think they would normally be inlined.

Also, I'm guessing that compilers with worse optimizers will make a hash of segmented proto.

--
Eric Niebler
Boost Consulting
www.boost-consulting.com

Attachment: segmented_fusion.tar.gz
Description: GNU Zip compressed data

<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise