[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Thoughts about 2019 Arrow development focus areas

hi folks,

I jotted down some high level ideas about directions I'd like to push
the various parts of the project on the C++ side along with the
language bindings in Python, R, Ruby, and others. Many people may know
that I am building a not-for-profit open source development team to
focus on Apache Arrow (https://ursalabs.org/), so this document is
partly for my colleagues to organize some lower-level technical
discussions and planning in the Arrow JIRA. I'm interested from
feedback from the whole Arrow community, and we obviously would love
to have as many people as possible involved who have an interest in
the C++ libraries and their bindings.

The simplified summary is that I would like to work toward an
embeddable in-memory query engine in C++ that can be used in all the
bindings. This can be used in numerous contexts, from data frame
libraries to streaming data transformation. As a simple example, we
could compile filter expressions with Gandiva and apply these to a
stream of record batches being materialized from a directory of
Parquet files.

There's a lot of pieces that still have to fall into place to do this
in a sustainable and non-hacky way.


Looking forward to the feedback of others!