logo       
Google Custom Search
    AddThis Social Bookmark Button

1.25 RC: msg#00058

Subject: 1.25 RC
I've finished off my work on making the fdx and fdt files use
pack/unpack. 

These are more complex than the other index files as they can get *big*.

The fdt files store all the fields that are indexed in documents. 
The fdx files are an index into the fdt files that tell us where each
document can be found.

So rather than slurp & unpack, we're still using the read a given number
of bytes technique. But when we're actually extracting each document, we
now work out how long it's going to be, and slurp that much of the file
in in one go and then unpack that. 

This is almost certainly going to break backwards compat with any
existing Plucene indexes if the documents in them are of any length at
all. The transform-fdt file in the bin directory should be able to
convert files between the old format and the new format.

It would be appreciated if a few people could give it a try and make
sure it works for them.  If I don't hear about any problems with this,
I'll release it as 1.25 tomorrow.

        http://www.tmtm.com/CPAN/Plucene-1.25.tar.gz


There are probably some more optimisations in this area, but this is
good enough for now:

hajime/1.21:
       s/iter   long medium  short
long     31.0     --   -87%   -96%
medium   4.04   668%     --   -67%
short    1.33  2232%   204%     --

hajime/1.22:
       s/iter   long medium  short
long     27.9     --   -87%   -95%
medium   3.73   647%     --   -65%
short    1.29  2061%   189%     --

hajime/1.23:
       s/iter   long medium  short
long     20.7     --   -85%   -94%
medium   3.03   583%     --   -59%
short    1.25  1557%   142%     --

hajime/1.24:
long     17.4     --   -85%   -93%
medium   2.66   552%     --   -55%
short    1.19  1358%   124%     --

hajime/1.25:
       s/iter   long medium  short
long     16.7     --   -85%   -93%
medium   2.57   548%     --   -54%
short    1.18  1311%   118%     --

Tony



Try Searching:
servers, voip, java, networking, microsoft ...
<Prev in Thread] Current Thread [Next in Thread>