osdir.com
mailing list archive
Mozy Online Backup: 2GB Free. Automatic. Secure.

Subject: Re: Question about CONVERT(str,charset_to,charset_from) - msg#00225

List: db.mysql.devel

Date: Prev Next Index Thread: Prev Next Index
Paul DuBois wrote:

I've been puzzling over this patch, which implements a form of the
CONVERT() function. I can see that this can be useful for specifying
the destination character set as a string expression rather than as
an unquoted character set name. But I'm wondering why the second argument
is necessary at all. Strings have a charset already, why do you have
to specify what it is?



There are some reasons against this style of CONVERT():

CONVERT(string,from_charset,to_charset)

They are:

1. As we lately decided, a function should never return strings
with different charsets in different rows. The above style of
CONVERT() breaks this rule.

2. Also I've already implemented COLLATE syntax, so one can easily
force a string to change it's charset, so one can use this for
example:
CONVERT(latin1_string COLLATE latin5 USING utf8)

In other hand, let's imagine that we have a mailing storage (as
Peter suggested in his example) this a structure like this:

CREATE TABLE mail (
body BLOB BINARY,
body_charset VARCHAR(32)
);


Now, if we want to send a body of all letters in UTF8,
this style of CONVERT() wants a an unquoted string on
the second place. As far as body_charset is expression
but not an unquoted charset name, this will fail:

SELECT CONVERT(body COLLATE body_charset USING utf8) ...


So, taking all this in account, I can suppose:

1. we should remove this style: CONVERT(expr,expr,expr), as far
as it can produce strings in different charsets in
different rows, which is wrong;

2. we can't extend COLLATE to support expression:
SELECT body COLLATE expr
because it will be able to produce different charsets
in different rows again.


Probably, the case is to extend CONVERT syntax to support
something like this:

CONVERT(expr FROM expr USING unquoted_charset_name)

where the first expr is a source string to convert and the
second expr is a source charset. This will not reduce a
functionality and also will not break the rule about
different charsets in rows.

What do you think?



At 19:11 +0400 3/29/02, bar@xxxxxxxxx wrote:

Below is the list of changes that have just been committed into a
4.1 repository of bar. When bar does a push, they will be propogated to
the main repository and within 24 hours after the push to the public repository.
For information on how to access the public repository
see http://www.mysql.com/doc/I/n/Installing_source_tree.html

ChangeSet
1.1178 02/03/29 19:11:06 bar@xxxxxxxxxxxxxxxxxxxxxx +3 -0
Now this syntax works too: CONVERT(string,charset_to,charset_from)
where charset_to and charset_from are expressions. For example:

CONVERT('test','latin2','cp1250')

sql/sql_yacc.yy
1.155 02/03/29 19:11:05 bar@xxxxxxxxxxxxxxxxxxxxxx +4 -0
Now this syntax works too: CONVERT(string,charset_to,charset_from)

sql/item_strfunc.h
1.18 02/03/29 19:11:04 bar@xxxxxxxxxxxxxxxxxxxxxx +10 -0
Now this syntax works too: CONVERT(string,charset_to,charset_from)

sql/item_strfunc.cc
1.42 02/03/29 19:11:04 bar@xxxxxxxxxxxxxxxxxxxxxx +73 -0
Now this syntax works too: CONVERT(string,charset_to,charset_from)


Also, it appears to me that the names of the second and third arguments
in the preceding descriptions is backward, because the function result has
the charset of the third argument, not the second:

mysql> select charset(convert('abc','latin1','utf8'));
+-----------------------------------------+
| charset(convert('abc','latin1','utf8')) |
+-----------------------------------------+
| utf8 |
+-----------------------------------------+
mysql> select charset(convert('abc','utf8','latin1'));
+-----------------------------------------+
| charset(convert('abc','utf8','latin1')) |
+-----------------------------------------+
| latin1 |
+-----------------------------------------+


# This is a BitKeeper patch. What follows are the unified diffs for the
# set of deltas contained in the patch. The rest of the patch, the part
# that BitKeeper cares about, is below these diffs.
# User: bar
# Host: gw.udmsearch.izhnet.ru
# Root: /usr/home/bar/mysql-4.1

--- 1.41/sql/item_strfunc.cc Fri Mar 29 18:22:18 2002
+++ 1.42/sql/item_strfunc.cc Fri Mar 29 19:11:04 2002
@@ -1843,6 +1843,79 @@
/* BAR TODO: What to do here??? */
}

+
+String *Item_func_conv_charset3::val_str(String *str)
+{
+ my_wc_t wc;
+ int cnvres;
+ const uchar *s, *se;
+ uchar *d, *d0, *de;
+ uint dmaxlen;
+ String *arg= args[0]->val_str(str);
+ String *to_cs= args[1]->val_str(str);
+ String *from_cs= args[2]->val_str(str);
+ CHARSET_INFO *from_charset;
+ CHARSET_INFO *to_charset;
+ + if (!arg || args[0]->null_value ||
+ !to_cs || args[1]->null_value ||
+ !from_cs || args[2]->null_value ||
+ !(from_charset=find_compiled_charset_by_name(from_cs->ptr())) ||
+ !(to_charset=find_compiled_charset_by_name(to_cs->ptr())))
+ {
+ null_value=1;
+ return 0;
+ }
+
+ s=(const uchar*)arg->ptr();
+ se=s+arg->length();
+ + dmaxlen=arg->length()*(to_charset->mbmaxlen?to_charset->mbmaxlen:1)+1;
+ str->alloc(dmaxlen);
+ d0=d=(unsigned char*)str->ptr();
+ de=d+dmaxlen;
+ + while( s < se && d < de){
+
+ cnvres=from_charset->mb_wc(from_charset,&wc,s,se);
+ if (cnvres>0)
+ {
+ s+=cnvres;
+ }
+ else if (cnvres==MY_CS_ILSEQ)
+ {
+ s++;
+ wc='?';
+ }
+ else
+ break;
+
+outp:
+ cnvres=to_charset->wc_mb(to_charset,wc,d,de);
+ if (cnvres>0)
+ {
+ d+=cnvres;
+ }
+ else if (cnvres==MY_CS_ILUNI && wc!='?')
+ {
+ wc='?';
+ goto outp;
+ }
+ else
+ break;
+ };
+ + str->length((uint) (d-d0));
+ str->set_charset(to_charset);
+ return str;
+}
+
+void Item_func_conv_charset3::fix_length_and_dec()
+{
+ /* BAR TODO: What to do here??? */
+}
+
+
String *Item_func_hex::val_str(String *str)
{
if (args[0]->result_type() != STRING_RESULT)

--- 1.17/sql/item_strfunc.h Fri Mar 29 18:22:19 2002
+++ 1.18/sql/item_strfunc.h Fri Mar 29 19:11:04 2002
@@ -489,6 +489,16 @@
const char *func_name() const { return "conv_charset"; }
};

+class Item_func_conv_charset3 :public Item_str_func
+{
+public:
+ Item_func_conv_charset3(Item *arg1,Item *arg2,Item *arg3)
+ :Item_str_func(arg1,arg2,arg3) {}
+ String *val_str(String *);
+ void fix_length_and_dec();
+ const char *func_name() const { return "conv_charset3"; }
+};
+

/*******************************************************
Spatial functions

--- 1.154/sql/sql_yacc.yy Fri Mar 29 18:22:20 2002
+++ 1.155/sql/sql_yacc.yy Fri Mar 29 19:11:05 2002
@@ -1664,6 +1664,10 @@
}
$$= new Item_func_conv_charset($3,cs);
}
+ | CONVERT_SYM '(' expr ',' expr ',' expr ')'
+ {
+ $$= new Item_func_conv_charset3($3,$5,$7);
+ }
| FUNC_ARG0 '(' ')'
{ $$= ((Item*(*)(void))($1.symbol->create_func))();}
| FUNC_ARG1 '(' expr ')'








---------------------------------------------------------------------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail internals-thread5077@xxxxxxxxxxxxxxx
To unsubscribe, e-mail <internals-unsubscribe@xxxxxxxxxxxxxxx>




Was this page helpful?
Yes No
Thread at a glance:

Previous Message by Date: click to view message preview

bk commit into mysqldoc tree

Below is the list of changes that have just been committed into a local mysqldoc repository of paul. When paul does a push these changes will be propagated to the main repository and, within 24 hours after the push, to the public repository. For information on how to access the public repository see http://www.mysql.com/doc/I/n/Installing_source_tree.html ChangeSet@xxxx, 2002-10-19 15:26:03-05:00, paul@xxxxxxxxxxxxxxxxxx manual.texi small fixes Docs/manual.texi 1.14 02/10/19 15:25:49 paul@xxxxxxxxxxxxxxxxxx +4 -5 small fixes # This is a BitKeeper patch. What follows are the unified diffs for the # set of deltas contained in the patch. The rest of the patch, the part # that BitKeeper cares about, is below these diffs. # User: paul # Host: teton.kitebird.com # Root: /home/paul/mysqldoc --- 1.13/Docs/manual.texi Thu Oct 17 08:56:00 2002 +++ 1.14/Docs/manual.texi Sat Oct 19 15:25:49 2002 @@ -24335,7 +24335,7 @@ may be used with @code{IO_THREAD} and @code{SQL_THREAD} options. (Slave) @item @code{SET SQL_LOG_BIN=0} -@tab Disables update logging if the user has the @code{SUPER} privilege. + @tab Disables update logging if the user has the @code{SUPER} privilege. Ignored otherwise. (Master) @item @code{SET SQL_LOG_BIN=1} @@ -24349,13 +24349,13 @@ @item @code{RESET MASTER} @tab Deletes all binary logs listed in the index file, resetting the binlog -index file to be empty. In pre-3.23.26 versions, use @code{FLUSH MASTER} +index file to be empty. In pre-3.23.26 versions, use @code{FLUSH MASTER}. (Master) @item @code{RESET SLAVE} @tab Makes the slave forget its replication position in the master logs. In pre 3.23.26 versions the command was called -@code{FLUSH SLAVE}(Slave) +@code{FLUSH SLAVE}. (Slave) @item @code{LOAD TABLE tblname FROM MASTER} @tab Downloads a copy of the table from master to the slave. Implemented @@ -24409,13 +24409,12 @@ @code{CHANGE MASTER TO MASTER_LOG_FILE='log_name_on_master', MASTER_LOG_POS=log_offset_on_master} on the slave after restoring the snapshot. - (Slave) @item @code{SHOW MASTER STATUS} @tab Provides status information on the binlog of the master. (Master) @item @code{SHOW SLAVE HOSTS} @tab Available after 4.0.0. Gives a -listing of slaves currently registered with the master (Master) +listing of slaves currently registered with the master. (Master) @item @code{SHOW SLAVE STATUS} @tab Provides status information on essential parameters of the slave thread. (Slave) --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail internals-thread5076@xxxxxxxxxxxxxxx To unsubscribe, e-mail <internals-unsubscribe@xxxxxxxxxxxxxxx>

Next Message by Date: click to view message preview

bk commit into 4.0 tree

Below is the list of changes that have just been committed into a local 4.0 repository of monty. When monty does a push these changes will be propagated to the main repository and, within 24 hours after the push, to the public repository. For information on how to access the public repository see http://www.mysql.com/doc/I/n/Installing_source_tree.html ChangeSet 1.1347 02/10/20 08:58:48 monty@xxxxxxxxxxxxxxx +1 -0 Fixed Changelog Docs/manual.texi 1.1230 02/10/20 08:53:22 monty@xxxxxxxxxxxxxxx +0 -2 Fixed Changelog # This is a BitKeeper patch. What follows are the unified diffs for the # set of deltas contained in the patch. The rest of the patch, the part # that BitKeeper cares about, is below these diffs. # User: monty # Host: hundin.mysql.fi # Root: /my/bk/mysql-4.0 --- 1.1229/Docs/manual.texi Wed Oct 16 13:11:20 2002 +++ 1.1230/Docs/manual.texi Sun Oct 20 08:53:22 2002 @@ -51689,8 +51689,6 @@ @item Fixed that @code{FLUSH STATUS} doesn't reset @code{Delayed_insert_threads}. @item -Fixed that @code{SHOW STATUS} doesn't reset @code{Delayed_insert_threads}. -@item Fixed core dump bug when using the @code{BINARY} cast on a @code{NULL} value. @item Fixed race condition when someone did a @code{GRANT} at the same time a new --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail internals-thread5078@xxxxxxxxxxxxxxx To unsubscribe, e-mail <internals-unsubscribe@xxxxxxxxxxxxxxx>

Previous Message by Thread: click to view message preview

Re: Question about CONVERT(str,charset_to,charset_from)

At 21:49 +0400 10/18/02, Peter Zaitsev wrote: On Friday 18 October 2002 21:39, you wrote: At 21:29 +0400 10/18/02, Peter Zaitsev wrote: >On Friday 18 October 2002 20:20, Paul DuBois wrote: >> I've been puzzling over this patch, which implements a form of the >> CONVERT() function. I can see that this can be useful for specifying >> the destination character set as a string expression rather than as >> an unquoted character set name. But I'm wondering why the second argument >> is necessary at all. Strings have a charset already, why do you have >> to specify what it is? >> > >You do not have charset for some of the constant strings. Give me an example of a string that doesn't have a charset. The only example I can think of would be a binary string. Are there others? > >Imagine the application which would Recieve emails, convert them from >specified encoding to unicode and store in the database. > >The other thing is I belive it could be made optional :) Yes. You're completely right and one will use binary string in the example I'm refering too as everything else may read to garbage. Thus binary string also has "binary" charset but it does not mean you want to use it as from charset. As it happens, even binary strings have a charset currently: SELECT CHARSET(BINARY 'abc') -> latin1 (the server's default character set is used). So I suppose that the utility of the charset_from argument is to specify a charset for data for which the server has not assigned an appropriate charset? (As in your example when you get text from a mail message.) What about the apparent reversal of the 2nd and 3rd arguments? Am I misinterpreting something, or are they misnamed? -- __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Mr. Peter Zaitsev <peter@xxxxxxxxx> / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Developer /_/ /_/\_, /___/\___\_\___/ Moscow, Russia <___/ www.mysql.com M: +7 095 725 4955 --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail internals-thread5066@xxxxxxxxxxxxxxx To unsubscribe, e-mail <internals-unsubscribe@xxxxxxxxxxxxxxx>

Next Message by Thread: click to view message preview

Re: Question about CONVERT(str,charset_to,charset_from)

Hi! >>>>> "Alexander" == Alexander Barkov <Alexander> writes: Alexander> Paul DuBois wrote: >> I've been puzzling over this patch, which implements a form of the >> CONVERT() function. I can see that this can be useful for specifying >> the destination character set as a string expression rather than as >> an unquoted character set name. But I'm wondering why the second argument >> is necessary at all. Strings have a charset already, why do you have >> to specify what it is? bar> There are some reasons against this style of CONVERT(): bar> CONVERT(string,from_charset,to_charset) I thought you implemented this function. bar> They are: bar> 1. As we lately decided, a function should never return strings bar> with different charsets in different rows. The above style of bar> CONVERT() breaks this rule. <cut> bar> So, taking all this in account, I can suppose: bar> 1. we should remove this style: CONVERT(expr,expr,expr), as far bar> as it can produce strings in different charsets in bar> different rows, which is wrong; bar> 2. we can't extend COLLATE to support expression: bar> SELECT body COLLATE expr bar> because it will be able to produce different charsets bar> in different rows again. bar> Probably, the case is to extend CONVERT syntax to support bar> something like this: bar> CONVERT(expr FROM expr USING unquoted_charset_name) bar> where the first expr is a source string to convert and the bar> second expr is a source charset. This will not reduce a bar> functionality and also will not break the rule about bar> different charsets in rows. bar> What do you think? Yes, the above would be ok. Regards, Monty CTO of MySQL AB --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail internals-thread5080@xxxxxxxxxxxxxxx To unsubscribe, e-mail <internals-unsubscribe@xxxxxxxxxxxxxxx>
Sign up for updates to this mailing list. email:
Loading Comments...
Home | News | Patents | Sitemap | FAQ | advertise

Advertising by