logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

Re: Index format compatibility: msg#00038

Subject: Re: Index format compatibility
Forward from Otis at Lucene, at his request. 

--- Otis <ogjunk-pl-/E1597aS9LQAvxtiuMwx3w@xxxxxxxxxxxxxxxx> wrote:

I disagree.  I think this will be a great loss. :(  It would be nice to
be able to quickly put together Plucene scripts to play with indices
built with Lucene.  I was hoping this index compatibility would be one
of the goals we would achieve when we complete the establishment of
lucene.apache.org (Lucene as a Top Level Apache project, not a
sub-project of Jakarta) and bring in all Lucene ports together.

Perhaps you should wait for us over at Lucene a bit before changing the
index format.  Perhaps with more minds put towards this problem we'd be
able to find solutions to performance problems without sacrificing
index compatibility.

Otis

--- ed phillips <ed-UMMiKf2pC/isTnJN9+BGXg@xxxxxxxxxxxxxxxx> wrote:

> Nice. I, for what it is worth, concur that index compatibility with
> Lucene should not be a priority. If someone wants to use an index in
> Lucene
> it should be and can be easily built in Lucene. 
> 
> Plucene developers can and should use the strengths of Perl to
> Plucene's advantage, no?
> 
> On Sun, Jan 23, 2005 at 11:04:23PM +0000, Tony Bowden wrote:
> > 
> > I worked with Marty today on quite a few Plucene things, and we've
> got
> > a few useful speedups (and encountered a few bugs that we can't fix
> yet)
> > 
> > We finally took the decision to scrap even trying for index
> compatability
> > with Lucene. We always hoped to have this, but the whole 32bit
> thing
> > means that most people really won't be able to achieve it.
> > 
> > Making this decision, however, leaves us free to make performance
> > increases by playing with the file format.
> > 
> > We played with the .fnm files first, as they seem to be simple (and
> were
> > in the path that I was Benchmarking this morning).
> > 
> > The basic format is that they first have a number to say how many
> fields
> > there will be, and then the list the fields and whether or not
> they're
> > indexed.
> > 
> > So write looks like:
> > 
> >     sub write {
> >             my ($self, $path) = @_; my $output =
> >             Plucene::Store::OutputStream->new($path);
> >             $output->write_vint(scalar @{ $self->{bynumber} });
> >             for my $fi (@{ $self->{bynumber} }) {
> >                     $output->write_string($fi->name);
> >                     $output->print(chr($fi->is_indexed ? 1 : 0));
> >             }
> >     }
> > 
> > and read:
> > 
> >     sub _read {
> >             my ($self, $stream) = @_;
> >             $self->_add_internal($stream->read_string,
> >             $stream->read_byte)
> >                     for 1 .. $stream->read_vint;
> >     }
> > 
> > By using BER packed integers, rather than Lucene's almost, but not
> > quite, version of the same thing, we can turn this into a simple
> > pack/unpack exercise:
> > 
> >     sub write {
> >             my ($self, $file) = @_;
> >             my $template = "w" . ("w/a*C" x @{ $self->{bynumber} });
> >             my $packed   = pack $template, scalar(@{ $self->{bynumber} }),
> >                     map { $_->name => ($_->is_indexed ? 1 : 0) } @{
> $self->{bynumber} };
> >             write_file($file => $packed);
> >     }
> > 
> >     sub _read {
> >             my ($self, $file) = @_;
> >             my @fields = unpack "w/(w/aC)", read_file($file->[1]);
> >             while (my ($field, $indexed) = splice @fields, 0, 2) {
> >                     $self->_add_internal($field => $indexed)
> >             }
> >     }
> > 
> > This cuts out all those byte-by-byte reads and writes, and a large
> > number of method calls. The speed gains aren't particularly
> noticable
> > in my simple tests but I believe it's because these files are
> generally
> > quite small.
> > 
> > If we can use this approach on some of the other files, though, I'm
> > hoping we'll see a bigger increase...
> > 
> > Tony
> > 
> > _______________________________________________
> > Plucene mailing list
> > Plucene-LhsUWLkCAQ8AvxtiuMwx3w@xxxxxxxxxxxxxxxx
> > http://www.kasei.com/mailman/listinfo/plucene
> _______________________________________________
> Plucene mailing list
> Plucene-LhsUWLkCAQ8AvxtiuMwx3w@xxxxxxxxxxxxxxxx
> http://www.kasei.com/mailman/listinfo/plucene
> 

----- End forwarded message -----


----- End forwarded message -----


<Prev in Thread] Current Thread [Next in Thread>