On Jun 30, 2005, at 7:02 AM, ed phillips wrote:
Congratulations Marvin,
Thank you.
It seems to me that theoretically there is no reason why a Perl
implementation can't be faster than Lucene. Plenty of aspects of
Lucene could be speeded up.
My question is, and please forgive someone with more curiousity than
time at the moment (production obligations call me), are you using
enough of the design behind Lucence such as for example the scoring
formula to be considered Lucene based if not a Lucene port?
Not really. There are a few areas in which Lucene/Plucene has
provided inspiration for Kinosearch, but that's also true for
mnoGoSearch, Xapian, Egothor, Search::FreeText, a few search engine
articles on Perl.com and elsewhere, etc. The two biggest influences
on Kinosearch are Lucene/Plucene and, believe it or not,
mnoGoSearch. mnoGoSearch was the first search engine I experimented
with extensively, and like mnoGoSearch, Kinosearch was originally
based on a MySQL backend. Boy, that was a while ago!
My other question is, why not roll your work into Plucene?
That would require a torso transplant for Plucene. You up for that? ;)
It would be possible to rearrange chunks of Kinosearch to
superficially resemble Lucene. In fact, that's probably a good idea
-- though I think that it may be possible to choose names which are
slightly more intuitive. (4rThe main class for indexing is
"Kindexer". The main class for searching is "KSearch". The name
"Kinosearch" contains the word "search". "Kino" is the main
character in John Steinbeck's novel "The Pearl", but the real benefit
of the name "Kinosearch" is that it lightens the burden on the
horrendously overloaded term "index" -- Kinosearch's indexes are
"kindexes", and Kinosearch's indexer is a "Kindexer". )
The low-level stuff in Kinosearch is pretty different, though. There
are a lot fewer classes. And Lucene/Plucene is so tightly
integrated. I don't think you could swap parts of Kinosearch into
Plucene -- I think you'd have to start with Kinosearch and abstract
out classes analogous to Lucene's.
Also, it has looked to me for some time like the Plucene project, if
not quite dead, had reached a persistent vegetative state. When I
first wrote to the list last July regarding performance issues, no
one even responded. :(
or did you
crib from Plucene and fork?
My earliest experiments in search engine tinkering were with the Perl
parts of mnoGoSearch, and were informed by some hard core MySQL
optimization I'd had to do as part of a logfile analysis system, and
data compression techniques I'd studied during my stint as an audio
mastering engineer. Kindexer.pm and KSearch.pm started out as
monolithic scripts, kindexer.plx and ksearch.cgi. Things have evolved
(quite a lot) from there.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
|