logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

Diagnosis: gnats.: msg#00032

Subject: Diagnosis: gnats.
Greets,

I'd gotten the Minty benchmark down into the 16 second range, knocking 2 seconds off by moving the deserialization algo to XS.

However my app had basically inlined everything in TermInfosWriter into the one write_postings() method. It was already a little messy, and that was before I started trying to figure out how to re-enable skipdata.

I concluded there was no choice but to reproduce TermInfosWriter. I did that, cutting as many corners as possible: I inlined writeTerm, and merged Term and TermInfo into a single hash (not even an object). Boom, we're back to over 20 seconds.

Java Lucene finishes indexing in 9 seconds. (This is with mergeFactor set to 1000, which is only fair because KinoSearch has a high -mem_threshold setting for Sort::External, so the "external" sort actually gets run "internally", that is in RAM. If I set mergeFactor to 10, Java Lucene takes 13 seconds to finish indexing.)

If I make Term and TermInfo separate objects again, use accessor methods rather than direct hash access to the variables, and abstract out writeTerm again, it's no mystery what's going to happen.

Multiply those 4 seconds times every other place in Plucene where you have objects instead of procedural programming, accessors instead of direct access, etc, and you're well on your way to 270 seconds.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


<Prev in Thread] Current Thread [Next in Thread>