[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arrow JS 0.4.0 Release

Another update: all the existing features and unit tests are working again except for the Table/RecordBatch streaming toString() implementations (and the `arrow2csv` utility), which I'll update later tonight.

On JS release cadence, I think Brian's right that the current setup is working counter to our original intent. I am used to (and prefer) a faster-paced release cycle, essentially releasing early and as often as bugs are fixed or features are added. Indeed, Graphistry maintains a repo <https://github.com/graphistry/arrow/commits/master> with the latest version of the library that we can build against, which I update when I fix any bugs or add features.

The JS project is young, and sometimes has to move at a rapid pace. I've felt the turnaround time involved in the vote/prepare/verify/publish release process is slower than would be helpful to me. I'm used to publishing patch release to npm as soon as possible, possibly multiple times a day.

None of the PMCs contribute to or use the JS version (if that's wrong, hit me up!) so there's been no release pressure from there. None of the JS contributors are PMCs so even if we want to do releases, we have to wait for the a PMC. My take is that everyone on the project (especially PMCs) are probably ungodly busy people, and since not releasing to npm hasn't been blocking me, I opt not to bother folks.

On 12/13/18 11:52 AM, Wes McKinney wrote:
+1 for synchronizing to the main releases when possible. In the 0.12
thread we have discussed moving to time-based releases (e.g. every 2
months). Time-based releases are helpful to create urgency around
getting work completed, and making sure that the project is always
ready to release.
On Thu, Dec 13, 2018 at 10:39 AM Brian Hulette <hulettbh@xxxxxxxxx> wrote:
Sounds great Paul! Really excited that this refactor is wrapping up. My
only concern with including this in 0.4.0 is that I'm not going to have the
time to thoroughly review it for a few weeks, so gating on that would
really delay it. But I can just manually test with some use-cases I care
about in lieu of a thorough review in the interest of time.

I think in the future (after 0.12?) it may behoove us to tie back in to the
main Arrow release cycle. The idea with the separate JS release was to
allow us to release faster, but in practice it has done the opposite. Since
the fall of 2017 we've cut two major JS releases (0.2, 0.3) while there
were four major main releases (0.8 - 0.11). Not to mention the disjoint
version numbers can be confusing to users - perhaps not as much of a
concern now that the format is pretty stable, but it can still be a
friction point. And finally selfishly - if we had been on the main release
cycle, the contributions I made in the summer would have been released in
either 0.10 or 0.11 by now.


On Thu, Dec 13, 2018 at 3:29 AM Paul Taylor <ptaylor@xxxxxxxxxx> wrote:

The ongoing JS refactor/upgrade branch
<https://github.com/trxcllnt/arrow/tree/js-data-refactor/js> is just
about done. It's passing all the integration tests, as well as a hundred
or so new unit tests. I have to update existing tests where the APIs
changed, battle with closure-compiler a bit, then it'll be ready to
merge in and ship out. I think I'll be able to wrap it up in the next
couple hours.

I started this branch to clean up the Vector Data classes to make it
easier to add higher-level Table and Vector operators, but as the Data
classes are fairly embedded in the core, it lead to a larger refactor of
the DataTypes, Vectors, Visitors, and IPC readers and writers.

While I was updating the IPC readers and writers, I took the opportunity
to back-port all the Node and WhatWG (browser) streams integration that
we've built for Graphistry. Putting it in the Arrow JS library means we
can better ensure zero-copy when possible, empowers library consumers to
easily build streaming applications in both server and browser
environments, and (selfishly) reduces complexity in my code base. It
also advances a longer term personal goal to more closely adhere to the
structure and organization of ArrowCPP when reasonable.

A non-exhaustive list of updates includes:

* Updates the Table, Schema, RecordBatch, Visitor, Vector, Data, and
DataTypes to ensure the generic type signatures cascade recursively
through the type declarations
* New io primitives that abstract over the (mutually exclusive) file and
stream APIs in both node and browser environments
* New RecordBatchReaders and RecordBatchWriters that directly use the
zero-copy node and browser io primitives
* A consolidated reflective Visitor implementation that supports late
binding to shortcut traversal, provides an easy API for building higher
level Vector operators
* Fixed bugs/added support for reading and writing DictionaryBatch
deltas (tricky)
* Updated all the dependencies and did some config file gardening to
make debugging tests easier
* Added a bunch of new tests

I'd be more than happy to help shepherd a 0.4.0 release of what's in
arrow/master if that's what everyone wants to do. But in the interest of
cutting a more feature-rich release and preventing customers paying the
cost of updating twice in a short time span, I vote we hold off for
another day or two and merge + release the work in the refactor branch.


On 12/9/18 10:51 AM, Wes McKinney wrote:
I agree that we should cut a JavaScript release.

With the amount of maintenance work on my plate I have to declare
bankruptcy on doing any more than I am right now. Can another PMC
volunteer to be the RM for the 0.4.0 JavaScript release?

On Tue, Dec 4, 2018 at 10:07 PM Brian Hulette<hulettbh@xxxxxxxxx>
Hi all,
It's been quite a while since our last major Arrow JS release (0.3.0 on
February 22!), and since then we've added several new features that will
make Arrow JS much easier to adopt. We've added convenience functions
creating Arrow vectors and tables natively in JavaScript, an IPC writer,
and a row proxy interface that will make integrating with existing JS
libraries much simpler.

I think it's time we cut 0.4.0, so I spent some time closing out or
postponing the last few JIRAs in JS-0.4.0. I got it down to just one
which involves documenting the release process - hopefully we can close
that out as we go through it again.

Please let me know if you think it makes sense to cut JS-0.4.0 now, or
you have any concerns.