hi
i have a problem while trying to build a spider using perl threads.
Consider the program below which is just an example to get going.
i wish to hit a certain site's frontpage for any number of times (for
example 300)
i imagine that since theres a lot of content on the page each request
will take some time to process, and therefore i imagine it would be nice
to delegate the tasks using threads.
my problem is in this very naive example that the unthreaded version is
much faster.
two questions:
1) is there something wrong with the threaded code ?
2) does anyone have a working example of a spider using threads ?
thanks
./allan
#############################################################
use strict;
use LWP;
use threads;
use threads::shared;
use LWP::RobotUA;
use URI;
my $MAX = 300;
my %store : shared;
my $robot;
my $count;
my $thr;
my $start = time();
my $url ="http://somewhere.com";
my $THREADS = 0;
init_robot();
# if we have an argument use the unthreaded version
if ($ARGV[0]) {
main_loop2();
} else {
$THREADS = 1;
main_loop();
}
print_hash();
my $end = time();
my $elapsed = $end - $start;
print "This took $elapsed seconds\n";
sub init_robot {
$robot = LWP::RobotUA->new("myname", 'my@xxxxxxxxx' );
my $delay = 1/6000;
$robot->delay($delay);
}
sub main_loop {
while($count < $MAX) {
$count++;
$thr = threads->new(\&lwp);
$thr->join;
}
}
sub main_loop2 {
while($count < $MAX) {
$count++;
lwp();
}
}
sub lwp {
my $response = $robot->get( $url );
my $content = $response->content;
lock(%store) if $THREADS;
if ($content =~ m,<title>([^<>]+)</title>,i) {
$store{$count} = $1;
}
return $count;
}
sub print_hash {
foreach my $key (keys %store) {
print "$key --> $store{$key}\n";
}
}
|
Try Searching:
servers, voip, java, networking, microsoft ...
|
|
|
|