|
|
Choosing A Webhost: |
AW: AW: Store strings in UTF-8/UNICODE format: msg#00187db.maxdb
Hermann Himmelbauer wrote: > > On Wednesday 27 October 2004 08:35, Zabach, Elke wrote: > > Hermann Himmelbauer wrote: > > > Hi, > > > I need to store multibyte characters into my database (MaxDB). To my > > > mind, there are two approaches: > > > > > > 1) Store everything in UTF-8 format in normal varchars. I don't know > if > > > MaxDB > > > will have problems with this, probably there will be errors such as > that > > > the > > > string length can be faulty. > > > 2) Store everything in the 2-byte Unicode format - but this will > nearly > > > double > > > the size of my database, moreover I don't know if I will have to > convert > > > every string from/to UTF-8 to/from Unicode. > > > > > > What's the appropriate solution? > > > > It is (as always) depending > > - on the number of characters not being ASCII-7-Bit, but being stored > with > > 3 or more bytes in UTF8. You said, that storing in UCS2 will nearly > double, > > so we can assume, ASCII-7-bit is the main thing and only few character > will > > need more than one byte even in UTF8 - on the functions / qualifications > > you need. > > With UTF8 stored in (VAR)CHAR (n) BYTE you will not be able to handle > > LENGTH, SUBSTR, (x)FILL, (x)TRIM, and so on or LIKE correctly. Sorting > > could do, perhaps even in a way you can live with. - on the knowledge > about > > the data you have. If you want to store at maximum n character (no > matter > > if ASCII or UNICODE), you will have to specify (VAR)CHAR (n). But if you > > want to store n UTF8-character in a (VAR)CHAR(x)BYTE-column, which value > is > > needed for x (not to be too small if several character need more than > just > > one byte)? BTW: the database then will not be able to check if more than > n > > character are stored in this value. - (VAR)CHAR(n) BYTE cannot be > compared > > to (VAR)CHAR(n) ASCII/UNICODE, but on the other hand: to store UTF8 in > > (VAR)CHAR(n) ASCII is not the recommended way of using, although I do > not > > know any knockout-criteria for this. - For storing as UTF8 no > > _UNICODE=YES-instance is needed, meaning that no unicode-storage of the > > catalog (info about tables, columns, views, users,...) is done. - If a > > record needs for example half the space in UTF8 than it does in UCS2, > > double the number of records can be stored in one page, meaning less > pages > > are needed for the same number of records, meaning less I/O / smaller > data > > cache needed for all records if your search strategies are so bad to > need a > > scan. > > > > ==> if you are sure that you now and ever will only need a stupid > container > > for your data and all handling (functions and so on, LIKE) will be done > in > > the application, then maybe (although storage is not too expensive) > storing > > as UTF8 is a good decision. If you really know this... > > Many thanks for your quick answer! > > So it seems best to store my strings in the Unicode (UCS-2) format. > Storage > space is not a big issue. > > To explain my situation, I already have a MaxDB database running in ASCII > mode. Characters are now encoded using the latin-1 character set. As our > application will become international, we have to switch to Unicode. > > At first I need to convert the data from latin-1 to unicode - can this be > done > with MaxDB (e.g. with the loader utility) or do I have to do the > conversion > from 1byte ASCII to 2byte UCS by hand (e.g. python script)? The loader will do the job > > Moreover I access the database via ODBC (Windows and Linux) and via the > python > interface. Is unicode supported on these call interfaces? Yes for ODBC. > I read somewhere > that there are problems with Unicode on Linux ODBC - is this true? Yes. Problem is the size of SQLWCHAR. We understand it as 2-bytes. On Unix/Linux usually wchar_t has 4-bytes. You have to translate between both codings when using MaxDB ODBC, if you want to use wchar_t. On Windows the Unicode-Driver is sqlod32w.dll and usually registered under "MaxDB (Unicode)". It supports the full range of the ODBC-Unicode-API. Under Linux/Unix both APIs are in libsqlod.[a|so]. In order to get it right you should always use SQLConnectW to establish a Unicode-ready connection to the DB. > Moreover I > read that it's possible to read/write Unicode strings with python but it's > impossible to use unicode strings for query strings - is this still true? Query strings in unicode still aren't possible, but you can use parameters for most cases: session.sqlX ("select * from tableA where columnX like ?", ['Ö%']) or, if you use the same statement several times with changing values: insert = session.prepare ('insert into tableA values (?, ?, ?)') for v1, v2, v3 in iterator: insert.execute ([v1, v2, v3]) > I > need simple select statements like "select * from tableA where columnX > like > 'Ö%'" and this should be possible also from python. > > What about sorting tables in unicode? Will e.g. "Ö" be interpreted as "OE" > and > be sorted accordingly? Sorting is done as with ASCII: in hex-sequence. Therefore Ö will be somewhere behind Z. If you want/need Ö to be ordered as OE/Oe, then please use MAPCHAR-Sets. Elke SAP Labs Berlin > > Best Regards, > Hermann > > -- > x1@xxxxxx > GPG key ID: 299893C7 (on keyservers) > FP: 0124 2584 8809 EF2A DBF9 4902 64B4 D16B 2998 93C7 -- MaxDB Discussion Mailing List For list archives: http://lists.mysql.com/maxdb To unsubscribe: http://lists.mysql.com/maxdb?unsub=gcdm-maxdb@xxxxxxxxxxx
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | RE: Error 4008 while connecting by ADO/ODBC, Olivier Dupuis |
|---|---|
| Next by Date: | About message -2003, Arto Pastinen |
| Previous by Thread: | AW: Can I create a generic stored procedure to sort any column pa ssed as input parameter???, Anhaus, Thomas |
| Next by Thread: | About message -2003, Arto Pastinen |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |
Home
| advertise | OSDir is
an inevitable website.
|