logo       

Re: multi-byte character sets in sed: msg#00006

editors.sed.user

Subject: Re: multi-byte character sets in sed


>I usually use gVim that I can match character correctly with the encoding
>set to cp936. Though in gVim I can match both a double-byte Character and
>a single-byte ASCII letter by '.', I still want to know if if could be
>achieved with sed. Or does sed plan to put the encoding support into
>future versions that we can pass the encoding to sed either by environment
>variable or by commandline option?
>
Yes, starting from version 4.1 on sed has full support for MBCS. The
environment variables are LC_CTYPE and LC_COLLATE. Though, if you use
them you may encounter weird behavior when a script expects an
environment with the default values of these variables, i.e. LC_CTYPE=C
LC_COLLATE=C: for example some locales demand that ranges (e.g. [A-Z])
match case-insensitively, and this is by now the most reported sed
non-bug (this behavior is mandated by POSIX).

Paolo



------------------------ Yahoo! Groups Sponsor --------------------~-->
Most low income households are not online. Help bridge the digital divide today!
http://us.click.yahoo.com/I258zB/QnQLAA/TtwFAA/dkFolB/TM
--------------------------------------------------------------------~->

--

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/sed-users/

<*> To unsubscribe from this group, send an email to:
sed-users-unsubscribe@xxxxxxxxxxxxxxx

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/






<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise