Thanks to both Ben, and Ronald for correcting me. I
had misunderstood the documentation.
I do expect that others may also think that using
$string =~ /\b$sub_str\b/ is a reasonable way to match
a substring on word-boundaries, or beginning/end of
the string, but hopefully they will realize the
nuanced effect of the "imaginary characters" at the
beginning and end of the string, or find this post.
Thanks again,
-Carl
--- Ben Tilly <btilly-Re5JQEeQqe8AvxtiuMwx3w@xxxxxxxxxxxxxxxx> wrote:
> On 11/15/06, Carl Eklof <carleklof-/E1597aS9LQAvxtiuMwx3w@xxxxxxxxxxxxxxxx>
> wrote:
> > Hi Guys n Gals,
> >
> > I have found some seemingly strange behavior that
> may
> > be of interest to this list.
> >
> > My assumption was that the \b pattern in a regex
> would
> > always match the beginning and end of a string (as
> > documented in the perlre page). However on my
> build of
> > 5.8.7 this is not the case if the character being
> > matched at the beginning or the end is a
> > "meta-character" ie. quotemeta would escape it.
> Also
> > note that escaping the charcter doesn't seem to
> make a
> > difference.
>
> Actually that is NOT as documented in the perlre
> page. And thoughts
> to the contrary are a misreading of the
> documentation.
>
> What the perlre page says is that there is an
> imaginary \W at the
> beginning and end of the string. The result is that
> if the first
> character in the string matches \w, then \b will
> match at the start,
> and if the last character matches \w, then \b will
> match at the end.
>
> However if the first and/or last characters do *not*
> match \w, then
> that is not a word boundary and \b will not match
> there.
>
> [examples snipped]
>
> > Maybe this is not a bug, and this is just another
> > nuance of regexs' that I have not learned, but it
> > looks very fishy.
>
> It is definitely not a bug. If the string is "...",
> then there are no
> words, hence no word boundaries, therefore \b should
> not match at all.
> (And it does not.)
>
> Conversely if the string is "hello" then there is a
> word, and it has
> boundaries, and those boundaries should be matched
> by \b. (And they
> are, thanks to the "imaginary characters" discussed
> in the
> documentation.)
>
> > Any thoughts/wisdom?
>
> See above.
>
> Cheers,
> Ben
>
|