logo       

Collations: msg#00219

text.xml.exist

Subject: Collations

Hi,

eXist finally supports collations! Sorting and all string comparison operators
have been modified to use a default collation if specified in the XQuery.
Also, a specific collation can be defined for each order spec in an "order
by" clause. The optional collation parameter allowed by most of the string
functions is not implemented yet, but the default collation will be observed.

I would be happy if users with knowledge in other languages could help to test
the functionality. I guess, the languages I speak have rather simple
rules ;-) Please have a look at the current CVS or today's snapshot.

The syntax to set the default collation is:

declare default collation collation-uri;

eXist recognizes the following URIs:

1) http://www.w3.org/2004/07/xpath-functions/collation/codepoint

Selects the unicode codepoint collation. This is the default if no collation
is specified. Basically, it means that only the standard Java implementations
of the comparison and string search functions are used.

2) http://exist-db.org/collation?lang=xxx&strength=xxx&decomposition=xxx

or just

?lang=xxx&strength=xxx&decomposition=xxx

lang selects a locale. The parameter should have the same form as in xml:lang,
for example: "de" or "de-DE" to select a german locale.

strength (optional): value should be one of "primary", "secondary", "tertiary"
or "identical".

decomposition (optional): one of "none", "full" or "standard".

I don't really know all the implications of these parameters. Please check the
Java documentation for java.text.Collator.

Examples:

1) the collation can be specified for each of sort expression in an FLWR:

for $w in
("das", "daß", "Buch", "Bücher", "Bauer", "Bäuerin", "Jagen", "Jäger")
order by $w collation "?lang=de-DE"
return $w

returns:

Bauer, Bäuerin, Buch, Bücher, das, daß, Jagen, Jäger

Without specifying the collation, it returns:

Bauer, Buch, Bäuerin, Bücher, Jagen, Jäger, das, daß

2) but it also changes the behaviour of string comparisons:

declare default collation "?lang=de-DE";

"Bäuerin" < "Bier"

returns "true". If you just use the default codepoint collation, it returns
"false".

Wolfgang


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise