osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RES: Implicit Casts for Arithmetic Operators


PLEASE TAKE MY EMAIL FROM THIS SHIT !!


-----Mensagem original-----
De: Michael Burman [mailto:yak@xxxxxx] 
Enviada em: terça-feira, 20 de novembro de 2018 13:51
Para: dev@xxxxxxxxxxxxxxxxxxxx
Assunto: Re: Implicit Casts for Arithmetic Operators

Yep, that's a good approach.

  - Micke

On Tue, Nov 20, 2018 at 5:12 PM Ariel Weisberg <adweisbe@xxxxxxxxxxx> wrote:

> Hi,
>
> +1
>
> This is a public API so we will be much better off if we get it right 
> the first time.
>
> Ariel
>
> > On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <jon@xxxxxxxxxxxxx> wrote:
> >
> > Sounds good to me.
> >
> > On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
> benedict@xxxxxxxxxx>
> > wrote:
> >
> >> So, this thread somewhat petered out.
> >>
> >> There are still a number of unresolved issues, but to make progress 
> >> I wonder if it would first be helpful to have a vote on ensuring we 
> >> are
> ANSI
> >> SQL 92 compliant for our arithmetic?  This seems like a sensible
> baseline,
> >> since we will hopefully minimise surprise to operators this way.
> >>
> >> If people largely agree, I will call a vote, and we can pick up a 
> >> couple of more focused discussions afterwards on how we interpret 
> >> the leeway it gives.
> >>
> >>
> >>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ariel@xxxxxxxxxxx> wrote:
> >>>
> >>> Hi,
> >>>
> >>> From reading the spec. Precision is always implementation defined. 
> >>> The
> >> spec specifies scale in several cases, but never precision for any 
> >> type
> or
> >> operation (addition/subtraction, multiplication, division).
> >>>
> >>> So we don't implement anything remotely approaching precision and 
> >>> scale
> >> in CQL when it comes to numbers I think? So we aren't going to 
> >> follow
> the
> >> spec for scale. We are already pretty far down that road so I would
> leave
> >> it alone.
> >>>
> >>> I don't think the spec is asking for the most approximate type. 
> >>> It's
> >> just saying the result is approximate, and the precision is
> implementation
> >> defined. We could return either float or double. I think if one of 
> >> the operands is a double we should return a double because clearly 
> >> the
> schema
> >> thought a double was required to represent that number. I would 
> >> also be
> in
> >> favor of returning a double all the time so that people can expect 
> >> a consistent type from expressions involving approximate numbers.
> >>>
> >>> I am a big fan of widening for arithmetic expressions in a 
> >>> database to
> >> avoid having to error on overflow. You can go to the trouble of 
> >> only widening the minimum amount, but I think it's simpler if we 
> >> always
> widen to
> >> bigint and double. This would be something the spec allows.
> >>>
> >>> Definitely if we can make overflow not occur we should and the 
> >>> spec
> >> allows that. We should also not return different types for the same
> operand
> >> types just to work around overflow if we detect we need more precision.
> >>>
> >>> Ariel
> >>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> >>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for 
> >>>> digging
> this
> >>>> out (and Mike for getting some empirical examples).
> >>>>
> >>>> We still have to decide on the approximate data type to return; 
> >>>> right now, we have float+bigint=double, but float+int=float.  I 
> >>>> think this
> is
> >>>> fairly inconsistent, and either the approximate type should 
> >>>> always
> win,
> >>>> or we should always upgrade to double for mixed operands.
> >>>>
> >>>> The quoted spec also suggests that decimal+float=float, and 
> >>>> decimal
> >>>> +double=double, whereas we currently have decimal+float=decimal, 
> >>>> +and
> >>>> decimal+double=decimal
> >>>>
> >>>> If we’re going to go with an approximate operand implying an
> >> approximate
> >>>> result, I think we should do it consistently (and consistent with 
> >>>> the
> >>>> SQL92 spec), and have the type of the approximate operand always 
> >>>> be
> the
> >>>> return type.
> >>>>
> >>>> This would still leave a decision for float+double, though.  The 
> >>>> most consistent behaviour with that stated above would be to 
> >>>> always take
> the
> >>>> most approximate type to return (i.e. float), but this would seem 
> >>>> to
> me
> >>>> to be fairly unexpected for the user.
> >>>>
> >>>>
> >>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ariel@xxxxxxxxxxx> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I agree with what's been said about expectations regarding
> expressions
> >> involving floating point numbers. I think that if one of the inputs 
> >> is approximate then the result should be approximate.
> >>>>>
> >>>>> One thing we could look at for inspiration is the SQL spec. Not 
> >>>>> to
> >> follow dogmatically necessarily.
> >>>>>
> >>>>> From the SQL 92 spec regarding assignment
> >> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
> >>>>> "
> >>>>>       Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
> >>>>>       FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> >> mutually
> >>>>>       comparable and mutually assignable. If an assignment would
> >> result
> >>>>>       in a loss of the most significant digits, an exception
> condition
> >>>>>       is raised. If least significant digits are lost,
> implementation-
> >>>>>       defined rounding or truncating occurs with no exception
> >> condition
> >>>>>       being raised. The rules for arithmetic are generally 
> >>>>> governed
> by
> >>>>>       Subclause 6.12, "<numeric value expression>".
> >>>>> "
> >>>>>
> >>>>> Section 6.12 numeric value expressions:
> >>>>> "
> >>>>>       1) If the data type of both operands of a dyadic 
> >>>>> arithmetic
> >> opera-
> >>>>>          tor is exact numeric, then the data type of the result 
> >>>>> is
> >> exact
> >>>>>          numeric, with precision and scale determined as follows:
> >>>>> ...
> >>>>>       2) If the data type of either operand of a dyadic 
> >>>>> arithmetic
> op-
> >>>>>          erator is approximate numeric, then the data type of the re-
> >>>>>          sult is approximate numeric. The precision of the result is
> >>>>>          implementation-defined.
> >>>>> "
> >>>>>
> >>>>> And this makes sense to me. I think we should only return an 
> >>>>> exact
> >> result if both of the inputs are exact.
> >>>>>
> >>>>> I think we might want to look closely at the SQL spec and 
> >>>>> especially
> >> when the spec requires an error to be generated. Those are 
> >> sometimes in
> the
> >> spec to prevent subtle paths to wrong answers. Any time we deviate 
> >> from
> the
> >> spec we should be asking why is it in the spec and why are we deviating.
> >>>>>
> >>>>> Another issue besides overflow handling is how we determine 
> >>>>> precision
> >> and scale for expressions involving two exact types.
> >>>>>
> >>>>> Ariel
> >>>>>
> >>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm not sure if I would prefer the Postgres way of doing 
> >>>>>> things,
> >> which is
> >>>>>> returning just about any type depending on the order of operators.
> >>>>>> Considering it actually mentions in the docs that using
> >> numeric/decimal is
> >>>>>> slow and also multiple times that floating points are inexact. 
> >>>>>> So
> >> doing
> >>>>>> some math with Postgres (9.6.5):
> >>>>>>
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::bigint*1.0::double 
> >>>>>> precision
> returns double
> >>>>>> precision 2147483647 <(214)%20748-3647> SELECT 2147483647 
> >>>>>> <(214)%20748-3647>::bigint*1.0 returns numeric
> 2147483647 <(214)%20748-3647>.0
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::bigint*1.0::real returns
> double
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::double 
> >>>>>> precision*1::bigint
> returns double
> >> 2147483647 <(214)%20748-3647>
> >>>>>> SELECT 2147483647 <(214)%20748-3647>::double 
> >>>>>> precision*1.0::bigint
> returns double
> >> 2147483647 <(214)%20748-3647>
> >>>>>>
> >>>>>> With + - we can get the same amount of mixture of returned types.
> >> There's
> >>>>>> no difference in those calculations, just some casting. To me 
> >>>>>> floating-point math indicates inexactness and has errors and 
> >>>>>> whoever
> >> mixes
> >>>>>> up two different types should understand that. If one didn't 
> >>>>>> want
> >> exact
> >>>>>> numeric type, why would the server return such? The floating 
> >>>>>> point
> >> value
> >>>>>> itself could be wrong already before the calculation - trying 
> >>>>>> to say
> >> we do
> >>>>>> it lossless is just wrong.
> >>>>>>
> >>>>>> Fun with 2.65:
> >>>>>>
> >>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743 
> >>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
> >>>>>>
> >>>>>> SELECT round(2.65) returns numeric 4 SELECT round(2.65::double 
> >>>>>> precision) returns double 4
> >>>>>>
> >>>>>> SELECT 2.65 * 1 returns double 2.65 SELECT 2.65 * 1::bigint 
> >>>>>> returns numeric 2.65 SELECT 2.65 * 1.0 returns numeric 2.650 
> >>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> >>>>>>
> >>>>>> SELECT round(2.65) * 1 returns numeric 3 SELECT round(2.65) * 
> >>>>>> round(1) returns double 3
> >>>>>>
> >>>>>> So as we're going to have silly values in any case, why pretend
> >> something
> >>>>>> else? Also, exact calculations are slow if we crunch large 
> >>>>>> amount of numbers. I guess I slightly deviated towards 
> >>>>>> Postgres' implemention
> >> in this
> >>>>>> case, but I wish it wasn't used as a benchmark in this case. 
> >>>>>> And
> most
> >>>>>> importantly, I would definitely want the exact same type 
> >>>>>> returned
> >> each time
> >>>>>> I do a calculation.
> >>>>>>
> >>>>>> - Micke
> >>>>>>
> >>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> >> benedict@xxxxxxxxxx>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> As far as I can tell we reached a relatively strong consensus 
> >>>>>>> that
> we
> >>>>>>> should implement lossless casts by default?  Does anyone have
> >> anything more
> >>>>>>> to add?
> >>>>>>>
> >>>>>>> Looking at the emails, everyone who participated and expressed 
> >>>>>>> a preference was in favour of the “Postgres approach” of 
> >>>>>>> upcasting to
> >> decimal
> >>>>>>> for mixed float/int operands?
> >>>>>>>
> >>>>>>> I’d like to get a clear-cut decision on this, so we know what 
> >>>>>>> we’re
> >> doing
> >>>>>>> for 4.0.  Then hopefully we can move on to a collective 
> >>>>>>> decision on
> >> Ariel’s
> >>>>>>> concerns about overflow, which I think are also pressing -
> >> particularly for
> >>>>>>> tinyint and smallint.  This does also impact implicit casts 
> >>>>>>> for
> mixed
> >>>>>>> integer type operations, but an approach for these will 
> >>>>>>> probably
> >> fall out
> >>>>>>> of any decision on overflow.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> >> murukesh.mohanan@xxxxxxxxx>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> I think you're conflating two things here. There's the loss
> >> resulting
> >>>>>>> from
> >>>>>>>> using some operators, and loss involved in casting. Dividing 
> >>>>>>>> an
> >> integer
> >>>>>>> by
> >>>>>>>> another integer to obtain an integer result can result in 
> >>>>>>>> loss,
> but
> >>>>>>> there's
> >>>>>>>> no implicit casting there and no loss due to casting.  
> >>>>>>>> Casting an
> >> integer
> >>>>>>>> to a float can also result in loss. So dividing an integer by 
> >>>>>>>> a
> >> float,
> >>>>>>> for
> >>>>>>>> example, with an implicit cast has an additional avenue for loss:
> >> the
> >>>>>>>> implicit cast for the operands so that they're of the same 
> >>>>>>>> type. I
> >>>>>>> believe
> >>>>>>>> this discussion so far has been about the latter, not the 
> >>>>>>>> loss
> from
> >> the
> >>>>>>>> operations themselves.
> >>>>>>>>
> >>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> >> benjamin.lerer@xxxxxxxxxxxx>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I would like to try to clarify things a bit to help people 
> >>>>>>>>> to
> >> understand
> >>>>>>>>> the true complexity of the problem.
> >>>>>>>>>
> >>>>>>>>> The *float *and *double *types are inexact numeric types. 
> >>>>>>>>> Not
> only
> >> at
> >>>>>>> the
> >>>>>>>>> operation level.
> >>>>>>>>>
> >>>>>>>>> If you insert 676543.21 in a *float* column and then read 
> >>>>>>>>> it, you
> >> will
> >>>>>>>>> realize that the value has been truncated to 676543.2.
> >>>>>>>>>
> >>>>>>>>> If you want accuracy the only way is to avoid those inexact
> types.
> >>>>>>>>> Using *decimals
> >>>>>>>>> *during operations will mitigate the problem but will not 
> >>>>>>>>> remove
> >> it.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I do not recall PostgreSQL behaving has described. If I am 
> >>>>>>>>> not
> >> mistaken
> >>>>>>> in
> >>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to 
> >>>>>>>>> what
> >> MS SQL
> >>>>>>>>> server and Oracle do. So all thoses databases will lose 
> >>>>>>>>> precision
> >> if you
> >>>>>>>>> are not carefull.
> >>>>>>>>>
> >>>>>>>>> If you truly need precision you can have it by using exact
> numeric
> >> types
> >>>>>>>>> for your data types. Of course it has a cost on performance,
> >> memory and
> >>>>>>>>> disk usage.
> >>>>>>>>>
> >>>>>>>>> The advantage of the current approach is that it give you 
> >>>>>>>>> the
> >> choice.
> >>>>>>> It is
> >>>>>>>>> up to you to decide what you need for your application. It 
> >>>>>>>>> is
> also
> >> in
> >>>>>>> line
> >>>>>>>>> with the way CQL behave everywhere else.
> >>>>>>>>>
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>> Muru
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> >>>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>> ----------------------------------------------------------------
> >>>>> ----- To unsubscribe, e-mail: 
> >>>>> dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> >>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >>>>>
> >>>>
> >>>>
> >>>> -----------------------------------------------------------------
> >>>> ---- To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> >>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >>>>
> >>>
> >>> ------------------------------------------------------------------
> >>> --- To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> >>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >>>
> >>
> >>
> >> -------------------------------------------------------------------
> >> -- To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> >> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >>
> >> --
> > Jon Haddad
> > http://www.rustyrazorblade.com
> > twitter: rustyrazorblade
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx