logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

Re: Re: File Formats and pack/unpack: msg#00008

Subject: Re: Re: File Formats and pack/unpack
> I don't know a lot about the performance issues here, but 
> calling unpack twice instead of once doesn't seem like a 
> good idea.

I'm not disagreeing with you at all, but I was curious myself so the
following is purely fyi, fwiw ;)

I appreciate also there are probably more scientific ways to do this,
and it's a bit of a moving target and mileage may vary ... but hey.

I used Data::Dumper to see the sort of data it was storing, based on
the sample prog in the earlier "out-of-order term" mail [1], and
indexing about 30 small docs.  I shut down most OS services, including
cron and atd.  I was the only user on the box, and monitoring "top"
confirmed the box was consitantly 99.0% idle prior and after testing.

Debian stable, and perl 5.8.6.
Linux 2.4.18-bf2.4
500Mhz Pentium 3, 256Mb ram
5400 RPM, Seagate drive

$ time ./read.pl
real    0m20.341s
user    0m20.080s
sys     0m0.240s

$ time ./read.pl 1 
real    0m20.303s
user    0m20.040s
sys     0m0.250s

The second one is the 2 x unpack call approach.

I repeated a few times, and the above numbers stayed consistent.  I
also checked the output from read.pl by dumping out the data to the
terminal and visually ensuring it was sane.  I up'd the loop in
read.pl from 10 to 100 and re-ran

I also decreased the loop in write from 100000 to 100, and up'd the
loop in read.pl to 10000.

In all cases the "2 x unpack call" approach was marginally faster. 
Which is puzzling.  Someone like to sanity check or have a better way
to test?

Thoughts welcome.

[1] http://www.kasei.com/pipermail/plucene/2005-April/000347.html

# --------------------------------------------------
#!perl

use File::Slurp;
my @fi;
for (my $i = 0; $i < 100000; $i++) {
        push(@fi, { name => '',        is_indexed => 0 });
        push(@fi, { name => 'content', is_indexed => 1 });
        push(@fi, { name => 'id',      is_indexed => 1 });
}
my $template = "w" . ("w/a*C" x @fi);
my $packed   = pack $template, scalar(@fi),
  map { $_->{name} => ($_->{is_indexed} ? 1 : 0) } @fi;
write_file('/var/tmp/timetest.dat' => $packed);

# --------------------------------------------------
#!perl

use File::Slurp;
my $data = read_file('/var/tmp/timetest.dat');
for(my $i = 0; $i < 10; $i++) {
        my $template = 'w/(w/aC)';
        if ($ARGV[0]) {
                my $count = unpack "w", $data;
                $template = "w" . ("w/a*C" x $count);
        }
        my ($null, @fields) = unpack $template, $data;
}


<Prev in Thread] Current Thread [Next in Thread>