logo       

[SPOILER] Super-crappy acrostic program: msg#00040

Subject: [SPOILER] Super-crappy acrostic program
This is the thing I hacked up so that I could generate an example
acrostic puzzle.  All the errors in the puzzle were my fault, not the
program's; it came up with correct clue lists, which I then screwed
up.  I wasn't going to post it, but nobody else has posted anything
better.

The program does a simple depth-first search.  There are three
possibly interesting innovations:

1. Since the first letter of each answer is known in advance, there's
   obviously no need to search the entire dictionary; the program has
   26 separate dictionaries, each one with words that begin with a
   particular letter of the alphabet. ($d{'w'} contains all the words
   that begin with 'w'.)

2. The program also has 26 auxiliary anagram dictionaries.  Suppose
   there are seven answer words.  After allocating six of these, the
   program knows precisely which letters must be in the final answer.
   It uses the anagram dictionary to look this up instantly.
   $a{"deehirty"} is a list of all the dictionary words that contain
   precisely the letters "deehirty" and no others.  

3. The program keeps track of the average number of unallocated
   letters per remaining clue, and uses this to order the search.  For
   example, if there are three clues left to which answer have not yet
   been assigned, and 19 unassigned letters left in the quotation, the
   program calculates an average of 19/3 = 6.33 letters per clue, and
   tries words with 6 letters first, then those with 7 letters.  This
   is to avoid a situation where the program is trying to allocate 87
   letters among 3 clues (or 3 letters among 87 clues!)  The
   dictionary $d{'w'}[7] is the list of words that begin with 'w' and
   are exactly 7 letters long.


It worked well enough to generate the small 40-letter example I posted
earlier, but was not successful in generating a puzzle for a longer
quotation with 229 letters and 23 clues.  

There are a number of obvious improvements possible, and probably a
larger number of not-obvious improvements.

my ($quotefile, $dict) = @_;

{ local $/ = "";
  open my($q), "<", $quotefile or die "$quotefile: $!";
  $Q = <$q>;
  $/ = undef;
  $A = <$q>;
}

for ($Q, $A) { tr/a-z/A-Z/; tr/A-Z//cd; }

open D, "<", $dict or die "$dict: $!";
while (<D>) {
        chomp;
        next if /[^a-z]/ || length($_) < 5;
        my ($init) = /(.)/;
        my $anagram = anagram($_);
        $d{$init}[length $_]{$_} = 1;
        $a{$init}{$anagram} = 1;
}
#warn "Read the dictionary.\n";

my @clueletter = split //, $A;

{my %count;
 my $length = 0;
 for my $l (split //, $Q) {
   $count{$l}++;
   $length++;
 }

 search(0, $#clueletter, undef, \%count, $length);
}

sub search {
  my ($N, $maxN, $clues, $count, $length) = @_;
  my $I = " |" x $N;
  my $dicts = $d{lc $clueletter[$N]};
  if ($clues) {
#    warn "$I Trying $clues->[0] in position $N ($length to go)\n";
  }

  return unless $count->{$clueletter[$N]};

  # Last clue
  if ($N eq $maxN) {
    my $ana_key = count_to_anagram($count);
    return unless $a{lc $clueletter[$N]}{$ana_key};

    for my $word (keys %{$dicts->[$length]}) {
      next unless anagram($word) eq $ana_key;
      win([$word, $clues]);
    }
    die "How did I get here?";
  }

  my $avg_len = $length / ($maxN-$N+1);
  return if $avg_len > 22;

  my @N = sort {abs($a-$avg_len) <=> abs($b-$avg_len)} (0.. $#$dicts);
  for my $wordlen (@N) {
    next unless $dicts->[$wordlen];
    for my $word (keys %{$dicts->[$wordlen]}) {
      my $new_count;
      next if length($word) >= $length;
      next unless $new_count = subtract($count, $word);
      search($N+1, $maxN, [$word, $clues], $new_count, $length-length($word));
    }
  }
  return;
}

sub win {
  my $cluelist = shift;
  my @clues;
  while ($cluelist) {
    unshift @clues, $cluelist->[0];
    $cluelist = $cluelist->[1];
  }
  print join "\n", @clues, "";
  exit 0;
}

sub anagram {
  join "", sort split //, $_[0];
}

sub count_to_anagram {
  my $c = shift;
  my $r = "";
  for ('A' .. 'Z') {
    $r .= lc() x $c->{$_};
  }
  $r;
}

sub subtract {
  my ($count, $word) = @_;
  my %ncount = %$count;
  for my $l (split //, uc $word) {
    return unless $ncount{$l}--;
  }
  return \%ncount;
}






<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
boot-loaders.gr...    php.pear.genera...    debugging.valgr...    kde.redhat.user...    text.xml.xsl.ge...    culture.languag...    hardware.microc...    java.servicemix...    redhat.release....    web.zope.plone....    user-groups.lin...    opendarwin.webk...    video.mjpeg.use...    sysutils.bcfg2....    encryption.gpg....    lx-office.devel...    xfree86.forum/2...    mail.mutt.devel...    acpi.devel/2003...    qnx.openqnx.dev...    network.irc.irs...    freebsd.devel.m...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe