Please take our Survey
logo       

Choosing A Webhost:
A web hosting service is a type of Internet hosting service that allows individuals and organizations to provide their own website accessible via the World Wide Web. Web hosts are companies that provide space on a server they own for use by their clients as well as providing Internet connectivity, typically in a data center. Web hosts can also provide data center space and connectivity to the Internet for servers they do not own to be located in their data center, called colocation. more...

AW: Store strings in UTF-8/UNICODE format: msg#00181

db.maxdb

Subject: AW: Store strings in UTF-8/UNICODE format

Hermann Himmelbauer wrote:

> Hi,
> I need to store multibyte characters into my database (MaxDB). To my mind,
> there are two approaches:
>
> 1) Store everything in UTF-8 format in normal varchars. I don't know if
> MaxDB
> will have problems with this, probably there will be errors such as that
> the
> string length can be faulty.
> 2) Store everything in the 2-byte Unicode format - but this will nearly
> double
> the size of my database, moreover I don't know if I will have to convert
> every string from/to UTF-8 to/from Unicode.
>
> What's the appropriate solution?

It is (as always) depending
- on the number of characters not being ASCII-7-Bit, but being stored with 3 or
more bytes in UTF8. You said, that storing in UCS2 will nearly double, so we
can assume, ASCII-7-bit is the main thing and only few character will need more
than one byte even in UTF8
- on the functions / qualifications you need.
With UTF8 stored in (VAR)CHAR (n) BYTE you will not be able to handle LENGTH,
SUBSTR, (x)FILL, (x)TRIM, and so on or LIKE correctly. Sorting could do,
perhaps even in a way you can live with.
- on the knowledge about the data you have. If you want to store at maximum n
character (no matter if ASCII or UNICODE), you will have to specify
(VAR)CHAR (n). But if you want to store n UTF8-character in a
(VAR)CHAR(x)BYTE-column, which value is needed for x (not to be too small if
several character need more than just one byte)? BTW: the database then will
not be able to check if more than n character are stored in this value.
- (VAR)CHAR(n) BYTE cannot be compared to (VAR)CHAR(n) ASCII/UNICODE, but on
the other hand: to store UTF8 in (VAR)CHAR(n) ASCII is not the recommended way
of using, although I do not know any knockout-criteria for this.
- For storing as UTF8 no _UNICODE=YES-instance is needed, meaning that no
unicode-storage of the catalog (info about tables, columns, views, users,...)
is done.
- If a record needs for example half the space in UTF8 than it does in UCS2,
double the number of records can be stored in one page, meaning less pages are
needed for the same number of records, meaning less I/O / smaller data cache
needed for all records if your search strategies are so bad to need a scan.

==> if you are sure that you now and ever will only need a stupid container for
your data and all handling (functions and so on, LIKE) will be done in the
application, then maybe (although storage is not too expensive) storing as UTF8
is a good decision. If you really know this...

Elke
SAP Labs Berlin

>
> Best Regards,
> Hermann
>
> --
> x1@xxxxxx
> GPG key ID: 299893C7 (on keyservers)
> FP: 0124 2584 8809 EF2A DBF9 4902 64B4 D16B 2998 93C7
>
> --
> MaxDB Discussion Mailing List
> For list archives: http://lists.mysql.com/maxdb
> To unsubscribe: http://lists.mysql.com/maxdb?unsub=elke.zabach@xxxxxxx


--
MaxDB Discussion Mailing List
For list archives: http://lists.mysql.com/maxdb
To unsubscribe: http://lists.mysql.com/maxdb?unsub=gcdm-maxdb@xxxxxxxxxxx




<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe

Navigation

Home | advertise | OSDir is an inevitable website. super tiny logo