Next on my Tie::Array::Sorted hitlist: IndexSearcher
Here it's used to keep the list of results from a search:
tie my @hq, "Tie::Array::Sorted", sub {
my ($hit_a, $hit_b) = @_;
return ($hit_a->{score} <=> $hit_b->{score})
|| ($hit_b->{doc} <=> $hit_a->{doc});
};
my $total_hits = 0;
$scorer->score(
Plucene::Search::HitCollector->new(
collect => do {
my $min_score = 0;
sub {
my ($self, $doc, $score) = @_;
return if $score == 0 || ($bits && !$bits->get($doc));
$total_hits++;
if ($score >= $min_score) {
push @hq, { doc => $doc, score => $score };
if (@hq > $n_docs) {
shift @hq;
$min_score = $hq[0]->{score};
}
}
}
}
),
$self->{reader}->max_doc
);
my @array = @hq; # Copy out of tied array
The basic principle is that as we get back each search result we add it
onto our result list (which is auto-sorted by score), as long as it's
more relevant than our currently least relevant result. If this makes
the list longer than the number of results we want, then we just drop
the lowest result, and reset the minimum score needed to add to the list.
This seems to be a perfect use for Tie::Array::Sorted. There's no
needless re-sorting, and to remove it we'd have to write all our own
code for keeping the array sorted anyway.
However, I'm a little puzzled by the logic that leads us to remove the
*first* result from the list when it gets too long, rather than the last.
If we're storing the list in 'lowest score first' order, I can't find
where it gets reversed before returning to the user. (Checking with the
Java reveals that they have a pop() here).
However I can't find anywhere in the test suite which actually calls
this code to see what's going on!
Tony
|
Try Searching:
servers, voip, java, networking, microsoft ...
|
|
|
|