Re: Algorithmic explorations of bitmaps vs. sentinel values
It should be possible to unroll the sentinel version in many cases. For
sum += (data[i] == SENTINEL) * data[i]
This doesn't work with NaN as a sentinel because 0 * NaN => NaN, but it can
work with other values.
On Tue, Oct 16, 2018 at 9:38 AM Antoine Pitrou <antoine@xxxxxxxxxx> wrote:
> Hi Wes,
> Le 16/10/2018 à 14:05, Wes McKinney a écrit :
> > hi folks,
> > I explored a bit the performance implications of using validity
> > bitmaps (like the Arrow columnar format) vs. sentinel values (like
> > NaN, INT32_MIN) for nulls:
> > http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/
> > The vectorization results may be of interest to those implementing
> > analytic functions targeting the Arrow memory format. There's probably
> > some other optimizations that can be employed, too.
> This is a nice write-up. It may also possible to further speed up
> things using explicit SIMD operations.
> For the non-null case, it should be relatively doable, see e.g.
> For the with-nulls case, it might be possible to do something with SIMD
> masks, but I'm not competent to propose anything concrete :-)
> > Caveat: it's entirely possible I made some mistakes in my code. I
> > checked the various implementations for correctness only, and did not
> > dig too deeply beyond that.
> > - Wes