osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FW: Why float('Nan') == float('Nan') is False


On Fri, Feb 15, 2019 at 4:15 PM Avi Gross <avigross at verizon.net> wrote:
>
> > You shouldn't be testing floats for identity.
>
> I am not suggesting anyone compare floats. I repeat that a nan is not
> anything. Now as a technicality, it is considered a float by the type
> command as there is no easy way to make an int that is a nan:

You've been working with float("nan") all this time. It is, big
surprise, a float. This is not a technicality. It *is* a float.

> Now for a deeper anomaly and please don't tell me I shouldn't do this.

Actually... you're back to comparing floats by identity. So you
shouldn't do this, apart from probing the interpreter itself. This is
not anomalous, it's just the way that Python's immutable types work.

> >>> nanfloat1 = float("nan")
> >>> nanfloat2 = float("nan")
> >>> nanmath1 = math.nan
> >>> nanmath2 = math.nan
> >>> nannumpy1 = numpy.nan
> >>> nannumpy2 = numpy.nan
> >>> nanfloat1 is nanfloat2
> False
> >>> nanmath1 is nanmath2
> True
> >>> nannumpy1 is nannumpy2
> True
> >>> nanfloat1 is nanmath1
> False
> >>> nanfloat1 is nannumpy1
> False
> >>> nanmath1 is nannumpy1
> False
>
> This seems a tad inconsistent but perhaps completely understandable. Yet all
> three claim to  float ...
>
> >>> list(map(type, [ nanfloat1, nanmath1, nannumpy1 ] ))
> [<class 'float'>, <class 'float'>, <class 'float'>]

Well... yes. Every one of them IS a float, and every one of them DOES
carry the value of "nan". and they're not identical. So? You can do
the same with other values of floats, too.

> Now granted comparing floats is iffy if the floats are computed and often
> fails because of the internal bit representation and rounding. But asking if
> a copy of a float variable to a new name points to the same internal does
> work:

Nope, this is nothing to do with rounding.

> What I see happening here is that math.nan is a real object of some sorts
> that is instantiated by the math module at a specific location and
> presumable setting anything to it just copies that, sort of.

Okay, I think I see the problem here. You're expecting Python objects
to have locations (they don't, but they have identities) and to be
copied (they aren't, they're referenced), and you're expecting nan to
not be a value (it is). Python's object model demands that math.nan be
a real object. Otherwise you wouldn't be able to do anything at all
with it.

> >>> id(math.nan)
> 51774064
>
> Oddly, every copy of it gets another address but the same other address
> which hints at some indirection in the way it was set up.
>
> >>> m = math.nan
> >>> id(m)
> 51774064
> >>> n = math.nan
> >>> id(n)
> 51774064
> >>> o = m
> >>> id(o)
> 51774064

That isn't an address, it's an integer representing the object's
identity. And you could do this with literally ANY Python object. That
is the entire definition of assignment in Python. When you assign an
expression to a name, and then look up the object via that name, you
get... that object. That is how most modern high level languages work.

> Now do the same for the numpy.nan implementation:
> This time that same address is reused:
>
> >>> m = numpy.nan
> >>> id(m)
> 57329632
> >>> n = numpy.nan
> >>> id(n)
> 57329632
>
> So the numpy nan is unique. The math nan is something else but confusingly
> generates a new but same copy. You may be getting the address o f a proxy
> one time and the real one another.

No, numpy.nan and math.nan behave exactly the same way. They are
distinct objects in the versions of Python and numpy that you're
using, although other versions would be legitimately able to reuse the
same object if they chose. Everything you do with assignment is going
to behave the same way. No matter what name you assign that object to,
it's the same object, and has the same ID.

> >>> m is n
> True
>
> But
>
> >>> m is math.nan
> False
>
> Should I give up? No, the above makes some sense as the id() function shows
> there ware two addresses involved in one case and not the other.

Not addresses, identities, and yes, there are two distinct objects here.

> A truly clean implementation might have one copy system-wide as happens with
> None or Ellipsis (...) but it seems the development in python went in
> multiple directions and is no longer joined.

True, and an equally clean implementation could guarantee that two
equal strings are stored at the same place in memory. Some languages
guarantee this. Others don't. Python doesn't make this guarantee, and
CPython doesn't behave that way, but it'd be perfectly valid for a
Python implementation to do exactly this. There isn't much benefit in
mandating this for floats, though; they don't take up much space.

> A similar test (not shown) with numpy.nan shows the m and n above are each
> other as well as what they copied because they share an ID.
>
> The solution is to NOT look at nan except using the appropriate functions.
>
> >>> [ (math.isnan(nothing), numpy.isnan(nothing))
>       for nothing in [ float("nan"), math.nan, numpy.nan ] ]
>
> [(True, True), (True, True), (True, True)]

Well... yes. That's what I meant when I said you shouldn't be
comparing floats for identity. You shouldn't ask if this float IS that
float, you should ask if this one is a nan.

> It seems that at least those two nan checkers work the same on all Not A
> Number variants I have tried. So seems safe to stick with it.

I would hope so. That's their job. They look at the *value*, not the
identity, and tell you whether or not it is a nan.

Do you understand now what I meant when I said you shouldn't be
comparing floats for identity?

If not, research Python's object model and get a clearer understanding
of identity vs value.

ChrisA