logo       

Re: Internationalized string sorting: msg#00043

php.tcphp

Subject: Re: Internationalized string sorting

As a followup, it appears that IBM's International Components
for Unicode (ICU, http://icu.sourceforge.net/) is capable of handling
most of these issues. It's not in PHP (C, C++, or Java) but it can
generate strcmp()-compatible sort keys that can be tossed into the
database alongside the language-specific strings, so an
ORDER BY clause can reference the sort-key column and get the
desired results.

IIRC there is also a preliminary ICU -> PHP binding extension that
can be used as well, but I don't know that it supports every ICU
feature available.

-Bob

Specifically, I have two strings, both UTF-8 encoded, and should have
at least the ISO-3166 country code and possibly the ISO-639 language
code. How do I determine the lexicographical order of these two strings?

From what I can tell strcoll() doesn't work for this purpose as it assumes
a character encoding based on the locale string passed to setlocale().
The multibyte extension (mbstring) doesn't include any string sorting/
ordering functions, either.


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise