IMPORTANT: Please do not post solutions, hints, or other spoilers
until at least 60 hours after the date of this message.
Thanks.
IMPORTANT: S'il vous plaît, attendez au minimum 60 heures après la
date de ce message avant de poster solutions, indices ou autres
révélations. Merci.
[Other translations omitted as I am not in a position to verify them.
Besides, I always found it bizarre to include translations of the
caveat without including translations of the quiz]
This week's quiz was inspired by a problem seen at my day job (*)
about a year ago that I was recently inspired to look at again.
Without further ado:
Your task is to transform an input file containing unique sorted lines
matching the regular expression /^\w+\.[A-Z]$/. For example, a
typical input file might contain:
ALDA.D
ALDA.Q
ALDA.W
AMTA.B
AMTA.E
AMTA.M
AMTA.X
BMX.F
C.X
DMZ.A
DMZ.X
Note that the input can be grouped by what comes before the '.' - in
the example, we have everything beginning with 'ALDA.', then
everything beginning with 'AMTA.', etc. Call the bit before the '.'
the prefix, and the bit after the dot the suffix.
Your job is to insert lines into the input so that for every unique
prefix, there is one (and only one!) line with the suffix ".M". For
example, the input above would yield the output
ALDA.D
ALDA.M
ALDA.Q
ALDA.W
AMTA.B
AMTA.E
AMTA.M
AMTA.X
BMX.F
BMX.M
C.M
C.X
DMZ.A
DMZ.M
DMZ.X
Note that the output will have every line that was in the input, plus
some extra .M lines (although note that AMTA.M was already present in
the input). Also, the output must consist only of sorted unique
lines.
Your program should take 2 arguments, an input filename and an output
filename. It would be convenient if your program also worked with
fewer arguments, but I'm not going to make that part of the quiz. (If
you do decide to go the extra mile and also accept fewer arguments,
with only one argument your program should use STDOUT as the output
file, and with no arguments your program should behave as a filter -
STDIN for input, STDOUT for output.)
As an added challenge, the file may be entirely too big to fit into
available memory. Your program should still behave properly. It is
in fact possible to write a solution whose memory requirements are
determined by the maximum length of an individual line, not the number
of lines in the file.
Be careful of the following edge cases:
echo C.X > test_input_1
perl ricfilter.pl test_input_1 test_output_1
# test output 1 should now contain two lines: C.M followed by C.X
echo C.G > test_input_2
perl ricfilter.pl test_input_2 test_output_2
# test output 2 should now contain two lines: C.G followed by C.M
cat /dev/null > test_input_3
perl ricfilter.pl test_input_3 test_output_3
# test output 3 should be blank
Have fun!
(*) However, the details(**) of the problem have been scrubbed so as
to have little to do with my employer.
(**) Extra points to someone who guesses what the real suffix that had
to be added was (it wasn't .M), and why, though you'd need to be in a
very small specific corner of the financial data processing world to
know that.
|