Algorithmic explorations of bitmaps vs. sentinel values
I explored a bit the performance implications of using validity
bitmaps (like the Arrow columnar format) vs. sentinel values (like
NaN, INT32_MIN) for nulls:
The vectorization results may be of interest to those implementing
analytic functions targeting the Arrow memory format. There's probably
some other optimizations that can be employed, too.
Caveat: it's entirely possible I made some mistakes in my code. I
checked the various implementations for correctness only, and did not
dig too deeply beyond that.