[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ANNOUNCE] Apache Beam 2.4.0 released

Woohoo, thanks Robert!

A few more notable changes that I'm aware of:

- BigQueryIO.write() now supports column-based partitioning, which makes it dramatically cheaper and faster to load a bunch of historical data into a time-partitioned table (1 load job total, instead of 1 load job per partition).
- Introduces the Wait.on() transform, which allows general-purpose sequencing of transforms, agnostic to batch vs. streaming. SpannerIO unit tests are [the pioneering user](https://github.com/apache/beam/pull/4264/files#diff-f74bb28a007fa6d75d065f14a7c76b38R150).
- Increases scalability of watching for new files: successfully tested for reading up to 1 million files, in 1 or 10,000 filepatterns.
- Introduces the TDigest sketch for estimating quantiles.

On Thu, Mar 22, 2018 at 5:23 AM Alexey Romanenko <aromanenko.dev@xxxxxxxxx> wrote:
Great news! Congrats!


On 22 Mar 2018, at 10:10, Romain Manni-Bucau <rmannibucau@xxxxxxxxx> wrote:

congrats guys

Romain Manni-Bucau
@rmannibucau |  Blog | Old BlogGithub | LinkedIn | Book

2018-03-22 9:50 GMT+01:00 Etienne Chauchot <echauchot@xxxxxxxxxx>:
Great !
Le jeudi 22 mars 2018 à 08:24 +0000, Robert Bradshaw a écrit :
We are pleased to announce the release of Apache Beam 2.4.0. Thanks goes to
the many people who made this possible.

Apache Beam is an open source unified programming model to define and
execute data processing pipelines, including ETL, batch and stream
(continuous) processing. See https://beam.apache.org

You can download the release here:


As well as many bugfixes, some notable changes in this release are:
- A new Python Direct runner, up to 15x faster than the old one.
- Kinesis support for reading and writing in Java
- Several refactoring to enable portability (Go/Python on Flink/Spark)

Full release notes can be found at