OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Implicit Casts for Arithmetic Operators


Thanks for bringing this up, it definitely needs to be discussed.

Last surprise is difficult here, since all major databases have their own
way of doing things and people will just assume that their way is the right
way.  On that note, some people will be surprised no matter what we do.

I'd rather avoid the pitfalls of returning incorrect results, so either
option 2 or 3 sound reasonable, but leaning towards the Postgres approach
of always returning a decimal for those cases.

Jon



On Tue, Oct 2, 2018 at 10:54 AM Benedict Elliott Smith <benedict@xxxxxxxxxx>
wrote:

> I agree, in broad strokes at least.  Interested to hear others’ positions.
>
>
>
> > On 2 Oct 2018, at 16:44, Ariel Weisberg <ariel@xxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > I think overflow and the role of widening conversions are pretty linked
> so I'll continue to inject that into this discussion. Also overflow is much
> worse since most applications won't be impacted by a loss of precision when
> an expression involves an int and float, but will care quite a bit if they
> get some nonsense wrapped number in an integer only expression.
> >
> > For VoltDB in practice we didn't run into issues with applications not
> making progress due to exceptions with real data due to the widening
> conversions. The range of double and long are pretty big and that hides
> wrap around/infinity.
> >
> > I think the proposal of having all operations return a decimal is
> attractive in that these expressions always result in a consistent type.
> Two pain points might be whether client languages have decimal support and
> whether there is a performance issue? The nice thing about always returning
> decimal is we can sidestep the issue of overflow.
> >
> > I would start with seeing if that's acceptable, and if it isn't then
> look at other approaches like returning a variety of types such when doing
> int + int return a bigint or int + float return a double.
> >
> > If we take an approach that allows overflow the ideal end state IMO
> would be to get all users to run Cassandra in way that overflow results in
> an error even in the context of aggregation. The road to get there is
> tricky, but maybe start by having it as an opt in tunable in
> cassandra.yaml. I don't know how/when we could ever change that as a
> default and it's unfortunate having an option like this that 99% won't know
> they should flip.
> >
> > It seems like having the default throw on overflow is not as bad as it
> sounds if you do the widening conversions since most people won't run into
> them. The change in the column types of results sets actually sounds worse
> if we want to also improve aggregrations. Many applications won't notice if
> the client library abstracts that away, but I think there are still cases
> where people would notice the type changing.
> >
> > Ariel
> >
> >> On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote:
> >> This (overflow) is an excellent point, but this also affects
> >> aggregations which were introduced a long time ago.  They already
> >> inherit Java semantics for all of the relevant types (silent wrap
> >> around).  We probably want to be consistent, meaning either changing
> >> aggregations (which incurs a cost for changing API) or continuing the
> >> java semantics here.
> >>
> >> This is why having these discussions explicitly in the community before
> >> a release is so critical, in my view.  It’s very easy for these
> semantic
> >> changes to go unnoticed on a JIRA, and then ossify.
> >>
> >>
> >>> On 2 Oct 2018, at 15:48, Ariel Weisberg <ariel@xxxxxxxxxxx> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I think we should decide based on what is least surprising as you
> mention, but isn't overridden by some other concern.
> >>>
> >>> It seems to me the priorities are
> >>>
> >>> * Correctness
> >>> * Performance
> >>> * User visible complexity
> >>> * Developer visible complexity
> >>>
> >>> Defaulting to silent implicit data loss is not ideal from a
> correctness standpoint.
> >>>
> >>> Doing something better like using wider types doesn't seem like a
> performance issue.
> >>>
> >>> From a user standpoint doing something less lossy doesn't look more
> complex as long as it's consistent, and documented and doesn't change from
> version to version.
> >>>
> >>> There is some developer complexity, but this is a public API and we
> only get one shot at this.
> >>>
> >>> I wonder about how overflow is handled as well. In VoltDB I think we
> threw on overflow and tended to just do widening conversions to make that
> less common. We didn't imitate another database (as far as I know) we just
> went with what least likely to silently corrupt data.
> >>>
> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213
> <
> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213
> >
> >>>
> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764
> <
> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764
> >
> >>>
> >>> Ariel
> >>>
> >>>> On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
> >>>> ç introduced arithmetic operators, and alongside these
> >>>> came implicit casts for their operands.  There is a semantic decision
> to
> >>>> be made, and I think the project would do well to explicitly raise
> this
> >>>> kind of question for wider input before release, since the project is
> >>>> bound by them forever more.
> >>>>
> >>>> In this case, the choice is between lossy and lossless casts for
> >>>> operations involving integers and floating point numbers.  In
> essence,
> >>>> should:
> >>>>
> >>>> (1) float + int = float, double + bigint = double; or
> >>>> (2) float + int = double, double + bigint = decimal; or
> >>>> (3) float + int = decimal, double + bigint = decimal
> >>>>
> >>>> Option 1 performs a lossy implicit cast from int -> float, or bigint
> ->
> >>>> double.  Simply casting between these types changes the value.  This
> is
> >>>> what MS SQL Server does.
> >>>> Options 2 and 3 cast without loss of precision, and 3 (or
> thereabouts)
> >>>> is what PostgreSQL does.
> >>>>
> >>>> The question I’m interested in is not just which is the right
> decision,
> >>>> but how the right decision should be arrived at.  My view is that we
> >>>> should primarily aim for least surprise to the user, but I’m keen to
> >>>> hear from others.
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx <mailto:
> dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx>
> >>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> <mailto:dev-help@xxxxxxxxxxxxxxxxxxxx>
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx <mailto:
> dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx>
> >>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> <mailto:dev-help@xxxxxxxxxxxxxxxxxxxx>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade