Subject: Re: tr A-Z a-z in locales other than C

On Tue, Jun 07, 2011 at 04:24:43AM +0400, Andrey Chernov wrote:
> On Tue, Jun 07, 2011 at 12:41:05AM +0200, Jilles Tjoelker wrote:

> > There is a related issue with ranges in regular expressions, glob and
> > fnmatch (likewise unspecified by POSIX outside the POSIX locale), but
> > this is less likely to cause problems.

> You care about ports, but suggested change is americano-centrism which
> kills tr usage for national language documents due to impossibility to
> specify whole national alphabet easily, just by two letters.

Hmm, so that's with translation to a constant, or with the -d and/or -s
options. In such cases, there may be a range for all letters with
collation order, but not with codeset order (mainly if "all letters"
includes letters with diacritical marks).

In FreeBSD, upper case sorts before lower case, so cases can be
distinguished this way but all letters may require two ranges. In most
other operating systems the cases go together so a single range is
sufficient, but cases cannot be distinguished. Making such things work
on multiple operating systems requires careful testing.

> Moreover, having differently treated regex ranges in tr vs other places
> you mention will produce additional chaos.

I think this is already inconsistent because some programs do not enable
locale or use different locale code.

With UTF-8 or other multibyte character sets, this is even more so
because functions like isalpha work very poorly by definition and there
is no collation support for such character sets in FreeBSD.

> Back to the ports: it is not hard to run _any_ port's make or configure
> with LANG=C directly by the ports Mk system to eliminate that problem.

True, but some ports install scripts with problematic tr calls.


les Tjoelker
freebsd-hackers@xxxxxxxxxxx mailing list
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@xxxxxxxxxxx"