logo       

Re: ActiveSupport::Multibyte for better Unicode support: msg#00170

lang.ruby.rails.core

Subject: Re: ActiveSupport::Multibyte for better Unicode support



On 24/09/2006, at 10:20 PM, Joshua Sierles wrote:

>> - Make sure your database character set is utf8
>> - Make sure all your tables have a character set of utf8
>> - Make sure your database.yml has 'encoding: utf8' set for each
>> database
>
> None of these steps are required officially unless you use utf-8
> specific features of the database (collation). The last setting seems
> to set the connection encoding, which shouldn't be required unless
> there is non-utf8 data stored in the database.

Not true! Collation and character set are separate things.

There are a couple of obvious reasons you want your database
character set to be UTF8 if you're storing UTF8 strings:

1. When you access the database through the mysql (or pgsql, or
other) command line, or through tools such as CocoaMySQL, you want
strings to display properly.

2. MySQL never treats strings as binary; they always have a character
set, which is latin1 (CP1252) by default. Putting UTF8 data into
fields marked as latin1 seems like asking for trouble. (There are
some byte values that are invalid in CP1252, so technically strings
containing those bytes are illegal. It's only through MySQL's
laziness in not checking the strings when the connection and table
character sets match up that you can get away with this at all.)

There are even worse potential pitfalls here too. On one of our
projects, we did everything except set the the connection encoding.
What happened was that a UTF8 string in Rails would be regarded as
CP1252 by MySQL, but MySQL knew that the tables needed UTF8, so it
did a CP1252 to UTF8 conversion on the (already UTF8) string before
writing it. As you can imagine, we ended up with all sorts of crap in
the database, and the occasional string got completely munged as
invalid CP1252 bytes were replaced with question marks.

These three things should at least be reduced to a single setting to
avoid mistakes. I can't imagine a situation in which you would want
to do one of them without the others.

>> - Put $KCODE='u' in your environment.rb
>
> This is only required if you use unicode strings in your Ruby code.

If your app handles UTF8, then you're going to want to write tests
involving UTF8 strings, so you're going to need this turned on. You
do write UTF8 tests for your apps, right? :)

> - Add an after_filter to application.rb to set the Content-Type
> header correctly
>
> Rails now defaults to utf-8 Content-Type.

Good to know. I'll take this as an endorsement of the idea the UTF8
should be the default for Rails apps. :)

Cheers,

Pete Yandell

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby
on Rails: Core" group.
To post to this group, send email to rubyonrails-core@xxxxxxxxxxxxxxxx
To unsubscribe from this group, send email to
rubyonrails-core-unsubscribe@xxxxxxxxxxxxxxxx
For more options, visit this group at
http://groups.google.com/group/rubyonrails-core
-~----------~----~----~----~------~----~------~--~---




<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise