logo       

Re: Hyphenation patterns and Unicode: msg#00145

tex.xetex

Subject: Re: Hyphenation patterns and Unicode

On 19 Oct 2005, at 11:48 am, Nicola Vitacolonna wrote:

Hi everybody,
the XeTeX FAQ says that hyphenation patterns should be "true Unicode" files. It is not clear to me if the following (excerpt of a) file (for Lithuanian) is ok:

\def\ltletters{
\catcode"81=11\lccode"81="A1\uccode"81="81%A nosine
\catcode"83=11\lccode"83="A3\uccode"83="83%C su pauksteliu
\catcode"84=11\lccode"84="A4\uccode"84="84%E su tasku
% etc...
}
\ltletters
\patterns{
.ap1
.api1
.a^^b23v
%etc...
}


This does not appear to be Unicode-compliant, as it is expecting character codes such as (hex) 81, 83, and 84 to be accented letters. (As it doesn't have these literal codes in the file, but uses ^^.. sequences, XeTeX will be able to read it; but the resulting patterns won't be correct for Unicode text.)

I assume this file was created to work with one of the 8-bit encodings used with TeX, such as T1, and this does not match Unicode encoding for the accented letters.

I would like to add this file to language.dat, rebuild all format files, and use LaTeX or XeLaTeX with babel. Is this expected to work? Or should I use the above file only for LaTeX with babel, and go for a different solution when I want to use XeLaTeX?

It would be possible to patch this file for XeTeX/Unicode in a similar way to others that I've looked at: test if it is being loaded by XeTeX, and if so, make the characters active and define them to expand to their Unicode equivalents. That way, the actual pattern lines can be left untouched, and the file still works as before when used with a standard TeX.

I don't see this file among the standard collection, but if you need assistance in adapting it for XeTeX, feel free to send me a copy.

JK


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise