[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Implicit Casts for Arithmetic Operators


I think overflow and the role of widening conversions are pretty linked so I'll continue to inject that into this discussion. Also overflow is much worse since most applications won't be impacted by a loss of precision when an expression involves an int and float, but will care quite a bit if they get some nonsense wrapped number in an integer only expression.

For VoltDB in practice we didn't run into issues with applications not making progress due to exceptions with real data due to the widening conversions. The range of double and long are pretty big and that hides wrap around/infinity. 

I think the proposal of having all operations return a decimal is attractive in that these expressions always result in a consistent type. Two pain points might be whether client languages have decimal support and whether there is a performance issue? The nice thing about always returning decimal is we can sidestep the issue of overflow.

I would start with seeing if that's acceptable, and if it isn't then look at other approaches like returning a variety of types such when doing int + int return a bigint or int + float return a double.

If we take an approach that allows overflow the ideal end state IMO would be to get all users to run Cassandra in way that overflow results in an error even in the context of aggregation. The road to get there is tricky, but maybe start by having it as an opt in tunable in cassandra.yaml. I don't know how/when we could ever change that as a default and it's unfortunate having an option like this that 99% won't know they should flip.

It seems like having the default throw on overflow is not as bad as it sounds if you do the widening conversions since most people won't run into them. The change in the column types of results sets actually sounds worse if we want to also improve aggregrations. Many applications won't notice if the client library abstracts that away, but I think there are still cases where people would notice the type changing.


On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote:
> This (overflow) is an excellent point, but this also affects 
> aggregations which were introduced a long time ago.  They already 
> inherit Java semantics for all of the relevant types (silent wrap 
> around).  We probably want to be consistent, meaning either changing 
> aggregations (which incurs a cost for changing API) or continuing the 
> java semantics here.
> This is why having these discussions explicitly in the community before 
> a release is so critical, in my view.  It’s very easy for these semantic 
> changes to go unnoticed on a JIRA, and then ossify.
> > On 2 Oct 2018, at 15:48, Ariel Weisberg <ariel@xxxxxxxxxxx> wrote:
> > 
> > Hi,
> > 
> > I think we should decide based on what is least surprising as you mention, but isn't overridden by some other concern.
> > 
> > It seems to me the priorities are
> > 
> > * Correctness
> > * Performance
> > * User visible complexity
> > * Developer visible complexity
> > 
> > Defaulting to silent implicit data loss is not ideal from a correctness standpoint.
> > 
> > Doing something better like using wider types doesn't seem like a performance issue.
> > 
> > From a user standpoint doing something less lossy doesn't look more complex as long as it's consistent, and documented and doesn't change from version to version.
> > 
> > There is some developer complexity, but this is a public API and we only get one shot at this. 
> > 
> > I wonder about how overflow is handled as well. In VoltDB I think we threw on overflow and tended to just do widening conversions to make that less common. We didn't imitate another database (as far as I know) we just went with what least likely to silently corrupt data.
> > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213 <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213>
> > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764 <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764>
> > 
> > Ariel
> > 
> > On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
> >> ç introduced arithmetic operators, and alongside these 
> >> came implicit casts for their operands.  There is a semantic decision to 
> >> be made, and I think the project would do well to explicitly raise this 
> >> kind of question for wider input before release, since the project is 
> >> bound by them forever more.
> >> 
> >> In this case, the choice is between lossy and lossless casts for 
> >> operations involving integers and floating point numbers.  In essence, 
> >> should:
> >> 
> >> (1) float + int = float, double + bigint = double; or
> >> (2) float + int = double, double + bigint = decimal; or
> >> (3) float + int = decimal, double + bigint = decimal
> >> 
> >> Option 1 performs a lossy implicit cast from int -> float, or bigint -> 
> >> double.  Simply casting between these types changes the value.  This is 
> >> what MS SQL Server does.
> >> Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) 
> >> is what PostgreSQL does.
> >> 
> >> The question I’m interested in is not just which is the right decision, 
> >> but how the right decision should be arrived at.  My view is that we 
> >> should primarily aim for least surprise to the user, but I’m keen to 
> >> hear from others.
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx <mailto:dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx>
> >> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx <mailto:dev-help@xxxxxxxxxxxxxxxxxxxx>
> >> 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx <mailto:dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx>
> > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx <mailto:dev-help@xxxxxxxxxxxxxxxxxxxx>

To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx