I've finished off my work on making the fdx and fdt files use
pack/unpack.
These are more complex than the other index files as they can get *big*.
The fdt files store all the fields that are indexed in documents.
The fdx files are an index into the fdt files that tell us where each
document can be found.
So rather than slurp & unpack, we're still using the read a given number
of bytes technique. But when we're actually extracting each document, we
now work out how long it's going to be, and slurp that much of the file
in in one go and then unpack that.
This is almost certainly going to break backwards compat with any
existing Plucene indexes if the documents in them are of any length at
all. The transform-fdt file in the bin directory should be able to
convert files between the old format and the new format.
It would be appreciated if a few people could give it a try and make
sure it works for them. If I don't hear about any problems with this,
I'll release it as 1.25 tomorrow.
http://www.tmtm.com/CPAN/Plucene-1.25.tar.gz
There are probably some more optimisations in this area, but this is
good enough for now:
hajime/1.21:
s/iter long medium short
long 31.0 -- -87% -96%
medium 4.04 668% -- -67%
short 1.33 2232% 204% --
hajime/1.22:
s/iter long medium short
long 27.9 -- -87% -95%
medium 3.73 647% -- -65%
short 1.29 2061% 189% --
hajime/1.23:
s/iter long medium short
long 20.7 -- -85% -94%
medium 3.03 583% -- -59%
short 1.25 1557% 142% --
hajime/1.24:
long 17.4 -- -85% -93%
medium 2.66 552% -- -55%
short 1.19 1358% 124% --
hajime/1.25:
s/iter long medium short
long 16.7 -- -85% -93%
medium 2.57 548% -- -54%
short 1.18 1311% 118% --
Tony
|
Try Searching:
servers, voip, java, networking, microsoft ...
|
|
|
|