> I don't know a lot about the performance issues here, but
> calling unpack twice instead of once doesn't seem like a
> good idea.
I'm not disagreeing with you at all, but I was curious myself so the
following is purely fyi, fwiw ;)
I appreciate also there are probably more scientific ways to do this,
and it's a bit of a moving target and mileage may vary ... but hey.
I used Data::Dumper to see the sort of data it was storing, based on
the sample prog in the earlier "out-of-order term" mail [1], and
indexing about 30 small docs. I shut down most OS services, including
cron and atd. I was the only user on the box, and monitoring "top"
confirmed the box was consitantly 99.0% idle prior and after testing.
Debian stable, and perl 5.8.6.
Linux 2.4.18-bf2.4
500Mhz Pentium 3, 256Mb ram
5400 RPM, Seagate drive
$ time ./read.pl
real 0m20.341s
user 0m20.080s
sys 0m0.240s
$ time ./read.pl 1
real 0m20.303s
user 0m20.040s
sys 0m0.250s
The second one is the 2 x unpack call approach.
I repeated a few times, and the above numbers stayed consistent. I
also checked the output from read.pl by dumping out the data to the
terminal and visually ensuring it was sane. I up'd the loop in
read.pl from 10 to 100 and re-ran
I also decreased the loop in write from 100000 to 100, and up'd the
loop in read.pl to 10000.
In all cases the "2 x unpack call" approach was marginally faster.
Which is puzzling. Someone like to sanity check or have a better way
to test?
Thoughts welcome.
[1] http://www.kasei.com/pipermail/plucene/2005-April/000347.html
# --------------------------------------------------
#!perl
use File::Slurp;
my @fi;
for (my $i = 0; $i < 100000; $i++) {
push(@fi, { name => '', is_indexed => 0 });
push(@fi, { name => 'content', is_indexed => 1 });
push(@fi, { name => 'id', is_indexed => 1 });
}
my $template = "w" . ("w/a*C" x @fi);
my $packed = pack $template, scalar(@fi),
map { $_->{name} => ($_->{is_indexed} ? 1 : 0) } @fi;
write_file('/var/tmp/timetest.dat' => $packed);
# --------------------------------------------------
#!perl
use File::Slurp;
my $data = read_file('/var/tmp/timetest.dat');
for(my $i = 0; $i < 10; $i++) {
my $template = 'w/(w/aC)';
if ($ARGV[0]) {
my $count = unpack "w", $data;
$template = "w" . ("w/a*C" x $count);
}
my ($null, @fields) = unpack $template, $data;
}
|