logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

Re: Problems indexing docs: msg#00040

Subject: Re: Problems indexing docs
The main problem is Plucene creates a new directory inside /tmp for each new document added to the index. These directories are not released UNTIL the perl script is finished, so I can't index more than 32.000 documents at once.

Albert

Minty wrote:

On 6/22/05, Albert Vila <avp-imBYgDdI/RHQT0dZR+AlfA@xxxxxxxxxxxxxxxx> wrote:
What I can [can't?] understand is why Plucene creates a subdirectory in /tmp for
each document I add to the index. JLucene and CLucene does not create
this directory, o maybe they delete it when the document has been added
to the index.

I'm not an expert on this, but more recent versions of Lucene have a
"RamDisk" based filesystem component, that it uses for temp files. This is both faster, and avoids issues with inodes (which I see ain't
the problem here anyway).

As far as I knew, Plucene creates temp files per indexed doc until it
hits $writer->set_mergefactor(), then these are all condensed into a
single segment (which uses ~5-6 files typical per segment, but can be
more) then deletes all the tmp files for the individual indexed docs
and repeats.

mergefactor defaults to 10 I think.  Indexing is a bit faster if you
increase this value.  It shouldn't keep temp files lying around after
it's reached that limit.




<Prev in Thread] Current Thread [Next in Thread>