[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Housing longer-term Arrow development, design, and roadmap documents

hi folks,

Since the scope of Apache Arrow has grown significantly in the last
2.5 years to encompass many programming languages and new areas of
functionality, I'd like to discuss how we could better accommodate
longer-term asynchronous discussions and stay organized about the
development roadmap.

At any given time, there could be 10 or more initiatives ongoing, and
the number of concurrent initiatives is likely to continue increasing
over time as the community grows larger. Just off the top of my head
here's some stuff that's ongoing / up in the air:

* Remaining columnar format design questions (interval types, unions, etc.)
* Arrow RPC client/server design (aka "Arrow Flight")
* Packaging / deployment / release management
* Rust language build out
* Go language build out
* Code generation / LLVM (Gandiva)
* ML/AI framework integration (e.g. with TensorFlow, PyTorch)
* Plasma roadmap
* Record data types (thread I just opened)

With ~500 open issues on JIRA, I have found that newcomers feel a bit
overwhelmed when they're trying to find a part of the project to get
involved with. Eventually one must sink one's teeth into the JIRA
backlog, but I think it would be helpful to have some centralized
project organization and roadmap documents to help navigate all of the
efforts going on in the project.

I don't think documents in the repository are a great solution for
this, as they don't facilitate discussions very easily --
documentation or Markdown documents (like the columnar format
specification) are good to write there when some decisions have been
made. Google Documents are great, but they are somewhat ephemeral.

I would suggest using the ASF's Confluence wiki for these purposes.
The Confluence UI is a bit clunky like other Atlassian products, but
the wiki-style model (central landing page + links to subprojects) and
collaboration features (comments and discussions on pages) would give
us what we need. I suspect that it integrates with JIRA also, which
would help with cross-references to particular concrete JIRA items
related to subprojects. Here's an example of a Confluence landing page
for another ASF project:

What do others think?