osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Can one make 'in' ungreedy?


On Mon, May 18, 2020 at 7:05 AM Chris Green <cl at isbd.net> wrote:
>
> I have a strange/minor problem in a Python program I use for mail
> filtering.
>
> One of the ways it classifies messages is by searching for a specific
> string in square brackets [] in the Subject:, the section of code that
> does this is:-
>
>     #
>     #
>     # copy the fields from the filter configuration file into better named variables
>     #
>     nm = fld[0]             # name/alias
>     dd = fld[1] + "/"       # destination directory
>     tocc = fld[2].lower()   # list address
>     sbstrip = '[' + fld[3] + ']'        # string to match in and/or strip out of subject
>     #
>     #
>     # see if the filter To/CC column matches the message To: or Cc: or if sbstrip is in Subject:
>     #
>     if (tocc in msgcc or tocc in msgto or sbstrip in msgsb):
>         #
>         #
>         # set the destination directory
>         #
>         dest = mldir + dd + nm
>         #
>         #
>         # Strip out list name (4th field) from subject if it's there
>         #
>         if sbstrip in msgsb:
>             msg.replace_header("Subject", msgsb.replace(sbstrip, ''))
>         #
>         #
>         # we've found a match so assume we won't get another
>         #
>         break
>
>
> So in the particular case where I have a problem sbstrip is "[Ipswich
> Recycle]" and the Subject: is "[SPAM] [Ipswich Recycle] OFFER:
> Lawnmower (IP11)".  The match isn't found, presumably because 'in' is
> greedy and sees "[SPAM] [Ipswich Recycle]" which isn't a match for
> "[Ipswich Recycle]".
>
> Other messages with "[Ipswich Recycle]" in the Subject: are being
> found and filtered correctly, it seems that it's the presence of the
> "[SPAM]" in the Subject: that's breaking things.
>
> Is this how 'in' should work, it seems a little strange if so, not
> intuitively how one would expect 'in' to work.  ... and is there any
> way round the issue except by recoding a separate test for the
> particular string search where this can happen?

>>> sbstrip = "[Ipswich Recycle]"
>>> subject = "[SPAM] [Ipswich Recycle] OFFER:Lawnmower (IP11)"
>>> sbstrip in subject
True

Clearly something else is going on in your program. I would run it in
the debugger and look at the values of the variables in the case when
it fails when you think it should succeed. I think you will see the
variables do not hold what you think they do.