Forward from Otis at Lucene, at his request.
--- Otis <ogjunk-pl-/E1597aS9LQAvxtiuMwx3w@xxxxxxxxxxxxxxxx> wrote:
I disagree. I think this will be a great loss. :( It would be nice to
be able to quickly put together Plucene scripts to play with indices
built with Lucene. I was hoping this index compatibility would be one
of the goals we would achieve when we complete the establishment of
lucene.apache.org (Lucene as a Top Level Apache project, not a
sub-project of Jakarta) and bring in all Lucene ports together.
Perhaps you should wait for us over at Lucene a bit before changing the
index format. Perhaps with more minds put towards this problem we'd be
able to find solutions to performance problems without sacrificing
index compatibility.
Otis
--- ed phillips <ed-UMMiKf2pC/isTnJN9+BGXg@xxxxxxxxxxxxxxxx> wrote:
> Nice. I, for what it is worth, concur that index compatibility with
> Lucene should not be a priority. If someone wants to use an index in
> Lucene
> it should be and can be easily built in Lucene.
>
> Plucene developers can and should use the strengths of Perl to
> Plucene's advantage, no?
>
> On Sun, Jan 23, 2005 at 11:04:23PM +0000, Tony Bowden wrote:
> >
> > I worked with Marty today on quite a few Plucene things, and we've
> got
> > a few useful speedups (and encountered a few bugs that we can't fix
> yet)
> >
> > We finally took the decision to scrap even trying for index
> compatability
> > with Lucene. We always hoped to have this, but the whole 32bit
> thing
> > means that most people really won't be able to achieve it.
> >
> > Making this decision, however, leaves us free to make performance
> > increases by playing with the file format.
> >
> > We played with the .fnm files first, as they seem to be simple (and
> were
> > in the path that I was Benchmarking this morning).
> >
> > The basic format is that they first have a number to say how many
> fields
> > there will be, and then the list the fields and whether or not
> they're
> > indexed.
> >
> > So write looks like:
> >
> > sub write {
> > my ($self, $path) = @_; my $output =
> > Plucene::Store::OutputStream->new($path);
> > $output->write_vint(scalar @{ $self->{bynumber} });
> > for my $fi (@{ $self->{bynumber} }) {
> > $output->write_string($fi->name);
> > $output->print(chr($fi->is_indexed ? 1 : 0));
> > }
> > }
> >
> > and read:
> >
> > sub _read {
> > my ($self, $stream) = @_;
> > $self->_add_internal($stream->read_string,
> > $stream->read_byte)
> > for 1 .. $stream->read_vint;
> > }
> >
> > By using BER packed integers, rather than Lucene's almost, but not
> > quite, version of the same thing, we can turn this into a simple
> > pack/unpack exercise:
> >
> > sub write {
> > my ($self, $file) = @_;
> > my $template = "w" . ("w/a*C" x @{ $self->{bynumber} });
> > my $packed = pack $template, scalar(@{ $self->{bynumber} }),
> > map { $_->name => ($_->is_indexed ? 1 : 0) } @{
> $self->{bynumber} };
> > write_file($file => $packed);
> > }
> >
> > sub _read {
> > my ($self, $file) = @_;
> > my @fields = unpack "w/(w/aC)", read_file($file->[1]);
> > while (my ($field, $indexed) = splice @fields, 0, 2) {
> > $self->_add_internal($field => $indexed)
> > }
> > }
> >
> > This cuts out all those byte-by-byte reads and writes, and a large
> > number of method calls. The speed gains aren't particularly
> noticable
> > in my simple tests but I believe it's because these files are
> generally
> > quite small.
> >
> > If we can use this approach on some of the other files, though, I'm
> > hoping we'll see a bigger increase...
> >
> > Tony
> >
> > _______________________________________________
> > Plucene mailing list
> > Plucene-LhsUWLkCAQ8AvxtiuMwx3w@xxxxxxxxxxxxxxxx
> > http://www.kasei.com/mailman/listinfo/plucene
> _______________________________________________
> Plucene mailing list
> Plucene-LhsUWLkCAQ8AvxtiuMwx3w@xxxxxxxxxxxxxxxx
> http://www.kasei.com/mailman/listinfo/plucene
>
----- End forwarded message -----
----- End forwarded message -----
|