|
Re: how to move to UTF-8 ? (was: An encoding problem): msg#00191debian-www-debian
Simon Paillard wrote: > On Wed, Jul 29, 2009 at 06:27:02PM +0200, Frans Pop wrote: >> FYI, I've just converted the Dutch translation to UTF-8. > > Could you please describe the steps you have performed and how ? > > For what we have identified: > - recode wml files (using recode from recode package) > find . -type d -exec recode latin1..utf8 {} \; I actually used (sponge is from moreutils): $ for i in $(find -type f); do \ iconv -f iso-8859-15 -t utf-8 $i | sponge $i; \ done I then checked the result with 'cvs diff -u'. That showed some pages (incorrectly) already had utf-8 encoded chars, so I reverted those. It turned out that this mangled the generated $Date fields (2007-01-01 had become 2007/01/01); I corrected that by doing (possibly not strictly necessary as the server would update them anyway on commit, but I wanted my diffs clean): $ for i in $(find -type f); do \ sed -ri "s% ([0-9]{4})/([0-9]{2})/([0-9]{2}) ([0-9]{2}:)% \1-\2-\3 \4%" $i; \ done > - update the .wmlrc file > -D CUR_LOCALE=fr_FR.UTF-8 > -D CHARSET=utf-8 Correct. > - convert charset of po files > cd po ; for file in *po ; do msgconv -t UTF-8 -o $file $file ; done Not strictly necessary, but I did indeed do that as well. > - some references to ISO-8859-15 (or old coding) in webpages about > website. > * pour le site web, devel/website/examples.wml et > international/french/web.wml > * pour la traduction, international/french/traduire.wml > - *.UTF-8 locale on www-master -> OK, checked > - redirections pages with specified charset > (devel/debian-installer/gtk-frontend.wml and distrib/cd.wml) I did not check any of that TBH, but then we don't have the first few translated. For the last, also: distrib/floppyinst.wml, distrib/netboot.wml. I have updated those now. Thanks for the hint! I also did a cleanup, replacing entities by encoded characters, e.g: $ for i in $(find -type f); do sed -ri "s/ä/ä/g" $i; done In the past it made sense to use entities to avoid encoding issues, but with the switch to utf-8 that's less relevant and the regular characters make the source more readable. Cheers, FJP -- To UNSUBSCRIBE, email to debian-www-REQUEST@xxxxxxxxxxxxxxxx with a subject of "unsubscribe". Trouble? Contact listmaster@xxxxxxxxxxxxxxxx
|
|
||||||||||||||||||||||||||
|
|
|
| News | Mail Home | sitemap | FAQ | advertise |