logo       

Re: Fusion-ifying proto parse trees: msg#00037

parsers.spirit.devel

Subject: Re: Fusion-ifying proto parse trees

Eric Niebler wrote:
Joel de Guzman wrote:
Joel de Guzman wrote:
Joel de Guzman wrote:
Eric Niebler wrote:

Eric Niebler wrote:

Joel de Guzman wrote:

Eric Niebler wrote:

The segmented algorithm looks complicated, but fortunately, I think all the algorithms will follow a similar pattern, so we should be able to crank them out without too much fuss. It is perfectly general and will work with any segmented data structure, not just binary trees, assuming you give it suitably defined iterators.

I have no performance numbers, but I have a good feeling about it. :-)

Thanks! I'm perusing the code now. This would be a good addition
to Fusion. I wonder how algorithms that return views would look
like, or if they are as efficient as suggested.

A view over a segmented sequence is itself a segmented sequence and would need segmented iterators. It would probably be best at this point to come up with some perf numbers to justify further work in this area.

I've done a little perf testing of the various approaches we have for traversing parse trees. I've attached the test.

Very interesting! Alas, I can't compile it :(

Ok, I got it to compile. Be wary of using namespace in the
global scope with VC7.1. I get lots of problems doing that
and I never do it again. In fact, this is most probably the
reason for the VC8 ICE, because now, I can compile on VC8.
See attached main.cpp.

I got mixed results testing with VC7.1, VC8.0 and g++3.4
on my laptop (1.5GHZ centrino) using fusion list and vector
(I fixed the fusion vector bug. It was a minor typo. please
do a cvs update). Fusion gives better results, except for
VC7.1/list/long-test, which I don't quite understand.
See attached results.txt

It's plausible that VC7.1 has an optimization bug, or it
fails to optimize the case for Fusion. I find it hard to
explain the 3x speedup. On VC8.0, that gain is lost.

The only way to verify this is to rewrite the test with
deterministic results. As it is, the numbers outputed
by the accumulator functor do not make sense. The results
of the traversals should be deterministic so that we can
determine that the code is doing the right thing. We
need reproducable results for all the tests.

Hi again,

I'm befuddled. To get a deterministic result from the accumulator,
I simply placed an un-timed for_each_s call at the top and printed
the result:

template<typename T>
double time_for_each_s(T const &t)
{
int i = 0;
boost::fusion::for_each_s(t, accumulator(i));
std::cout << i << std::endl;
...


I think the problem with doing it this way is that the compiler may see that the assignments to i which happen further down in the times calls are never used. They might be getting optimized away.

I've attached a version that tries harder to force the compiler to actually generate code. The results are interesting. I still see the bizarre 3x speed-up on VC7.1 for large ET trees, but now segmented proto is beating the pre-flattened lists for small ET trees, too!

// VC7.1, fusion list
Short test ...
80
80
80
Fusion list time : 2.84985e-009
Non-segmented proto time : 2.79397e-009
Segmented proto time : 2.79397e-009
Long test ...
480
480
480
Fusion list time : 4.1008e-008
Non-segmented proto time : 4.17233e-007
Segmented proto time : 1.86265e-008

It's not beating the pre-flattened lists in my tests:
/////////////////////////////////// VC7.1
Short test ...
80
80
80
Fusion list time : 4.07547e-009
Non-segmented proto time : 4.65661e-009
Segmented proto time : 4.77582e-009
Long test ...
480
480
480
Fusion list time : 9.67979e-008
Non-segmented proto time : 6.40869e-007
Segmented proto time : 3.17097e-008

Proto still lags badly with gcc:

// gcc, fusion list
Short test ...
80
80
80
Fusion list time : 2.7895e-08
Non-segmented proto time : 1.13606e-07
Segmented proto time : 1.19209e-07
Long test ...
480
480
480
Fusion list time : 1.74999e-07
Non-segmented proto time : 1.10245e-06
Segmented proto time : 9.53674e-07

And ditto for VC8.0, but not much:
/////////////////////////////////// VC8.0
Short test ...
80
80
80
Fusion list time : 4.5374e-009
Non-segmented proto time : 4.54485e-009
Segmented proto time : 5.93811e-009
Long test ...
480
480
480
Fusion list time : 2.18749e-008
Non-segmented proto time : 6.10352e-007
Segmented proto time : 3.1203e-008

The reason I used a list in the test is because that's what Spirit-2 was using. Would you really consider switching to vector, which would place limits on the number of nodes in a sequence?

No, I wouldn't use a vector. Spirit pre-2 did use vectors but
I had problems with large trees. One plausible strategy though
is to use vectors on small trees and switch to lists on larger
ones. Ah the joys of multi-container types!

Also, these tests leave out one detail ... the cost of building the expression template. Is building an unflattened proto parse tree more or less expensive than flattening into a list/vector eagerly? We should really be taking that into account. The cost of building the ET would probably dominate.

I think it would be the same, but I am guessing. I am basing that
guess from the code size generated from proto and the pre-proto
code, which more or less amounts to the same footprint.

Regarding your question about taking the minimum result, I lifted this code from John Maddock's regex performance shoot-out. This is how he collects his timing. Yes, this is a black art -- I don't really understand it.

I recall Dave A mention some benchmarking strategies that take off
the unwanted effects. I'm still looking for the links. You can see
in the VC8.0 tests that fusion list is winning. My guess is that
this is a problem with VC7.1's optimizer that gives fusion lists
a great disadvantage. My guess is that if that effect is known
and isolated, fusion lists can be optimized for VC7.1 to mitigate
this effect. I believe that what we are seeing is an effect; not
really due to the raw-algorithmic advantage of segmentation. IOTW,
I think you were just lucky to have found a good formulation for 7.1
and for 7.1 alone.

Cheers,
--
Joel de Guzman
http://www.boost-consulting.com
http://spirit.sf.net



-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise