Okay, so I missed the boat on this, but I had a chance to play with it a
bit, and I'm pretty proud of my solution.
I didn't understand why so many people were posting regex based
solutions when perl ships with a built in solution -- index().
A few versions later I have the below. The real key is a few
simple optimizations, and I think it comes across as fairly readable.
Optimizations:
1) We start looking for a substring of length N (starting at 1), trying all such
substrings in turn. As soon as we fail to find one, we can quit,
because any repeated substring of length N+1 requires that one of length
N exist.
2) Any time we find a match at length N, we start the search for size N+1 at
the location of the match, because we know there is not one earlier.
3) Any time we find a match at length N, we start the substr listing of
the next size at the start of the substring, because we know there is
not one earlier.
I've gotten great performance from this, but I'm a bit concerned I've
over-optimized away some corner case. Comments welcome.
sub repeated_substring {
my $str = shift;
return undef if length($str) < 2;
study($str);# does this work with index()?
my $start = 0;
my $maxlength = int(length($str) / 2); #round down
my $sublength = 1; #length of current substr
my $farpoint = 1; #start of searches
my ($found, $substr, $candidate);
while($sublength <= $maxlength){
$substr = substr($str, $start, $sublength);
#if we've tried this string, no need to try again.
if(($found = index($str, $substr, $farpoint)) > -1){
$candidate = $substr; #save this string if it's good
$sublength++; #next loop
$farpoint = $found; #no point in searching before this.
next;
}
#We haven't matched yet at this size
if(($farpoint + $sublength) <= length($str) ){
#Try again at this size, next string
$start++;
$farpoint = $start+$sublength;
} else {
#Nothing at this size, no point in going larger.
last;
}
}
return $candidate;
}
--
SwiftOne / Brett Sanger
swiftone-odjg+bCo0yxg9hUCZPvPmw@xxxxxxxxxxxxxxxx
|