logo       

Re: non-greedy regexp: msg#01124

lang.ruby.general

Subject: Re: non-greedy regexp

Hello --

On Tue, 13 Aug 2002, Tom Robinson wrote:

> Hi,
>
> The following regexp is supposed to chop off the last / of a string
> and all characters following it, but it seems to be ignoring the
> non-greedy indicator (?):
>
> irb(main):001:0> "http://www.x.com/y/z.html".sub(%r|/.+?\.html$|, '')
> "http:"
>
> The expected result should be "http://www.x.com/y";. I thought this
> was a bug but perl produces the same result, so what am I missing?

You're missing the notion of a leftmost match. The regex engine reads
from left to right, so to speak, in looking for the '/'. It finds it
in the sixth character. Then it does what you ask: namely, look for
'.html' at the end of the line.

To do what you were trying to do, try this:

irb> "http://www.x.com/y/z.html".sub(%r|/[^/]+/?\.html$|, '')
"http://www.x.com/y";

That also finds the leftmost match -- but in this case, the leftmost
match doesn't start until the last '/' (because none of the other
'/'s, even though they're further left, allow the rest of the match to
succeed).


David

--
David Alan Black
home: dblack@xxxxxxxxxxxxxxxxxxxx
work: blackdav@xxxxxxx
Web: http://pirate.shu.edu/~blackdav




<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise