[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arrow JS 0.4.0 Release

The ongoing JS refactor/upgrade branch <https://github.com/trxcllnt/arrow/tree/js-data-refactor/js> is just about done. It's passing all the integration tests, as well as a hundred or so new unit tests. I have to update existing tests where the APIs changed, battle with closure-compiler a bit, then it'll be ready to merge in and ship out. I think I'll be able to wrap it up in the next couple hours.

I started this branch to clean up the Vector Data classes to make it easier to add higher-level Table and Vector operators, but as the Data classes are fairly embedded in the core, it lead to a larger refactor of the DataTypes, Vectors, Visitors, and IPC readers and writers.

While I was updating the IPC readers and writers, I took the opportunity to back-port all the Node and WhatWG (browser) streams integration that we've built for Graphistry. Putting it in the Arrow JS library means we can better ensure zero-copy when possible, empowers library consumers to easily build streaming applications in both server and browser environments, and (selfishly) reduces complexity in my code base. It also advances a longer term personal goal to more closely adhere to the structure and organization of ArrowCPP when reasonable.

A non-exhaustive list of updates includes:

* Updates the Table, Schema, RecordBatch, Visitor, Vector, Data, and DataTypes to ensure the generic type signatures cascade recursively through the type declarations * New io primitives that abstract over the (mutually exclusive) file and stream APIs in both node and browser environments * New RecordBatchReaders and RecordBatchWriters that directly use the zero-copy node and browser io primitives * A consolidated reflective Visitor implementation that supports late binding to shortcut traversal, provides an easy API for building higher level Vector operators * Fixed bugs/added support for reading and writing DictionaryBatch deltas (tricky) * Updated all the dependencies and did some config file gardening to make debugging tests easier
* Added a bunch of new tests

I'd be more than happy to help shepherd a 0.4.0 release of what's in arrow/master if that's what everyone wants to do. But in the interest of cutting a more feature-rich release and preventing customers paying the cost of updating twice in a short time span, I vote we hold off for another day or two and merge + release the work in the refactor branch.


On 12/9/18 10:51 AM, Wes McKinney wrote:
I agree that we should cut a JavaScript release.

With the amount of maintenance work on my plate I have to declare
bankruptcy on doing any more than I am right now. Can another PMC
volunteer to be the RM for the 0.4.0 JavaScript release?

On Tue, Dec 4, 2018 at 10:07 PM Brian Hulette<hulettbh@xxxxxxxxx>  wrote:
Hi all,
It's been quite a while since our last major Arrow JS release (0.3.0 on
February 22!), and since then we've added several new features that will
make Arrow JS much easier to adopt. We've added convenience functions for
creating Arrow vectors and tables natively in JavaScript, an IPC writer,
and a row proxy interface that will make integrating with existing JS
libraries much simpler.

I think it's time we cut 0.4.0, so I spent some time closing out or
postponing the last few JIRAs in JS-0.4.0. I got it down to just one JIRA
which involves documenting the release process - hopefully we can close
that out as we go through it again.

Please let me know if you think it makes sense to cut JS-0.4.0 now, or if
you have any concerns.