@Lukasz: just a small precision on the bench I shared earlier: the overhead of CompletionStage (implemented with a "fast" flavor) is of < 7% if you ignore the usage of lambda (pass a function instance and not using lambda ref - not sure why the JVM doesn't handles it directly but since a JVM upgrade from the u40 to u144 made a 75% boost thanks to lambda+gc optims, I don't worry much of that part). Here are the raw results I get (Sharing beam one too since I used another computer):
Comparison.beam thrpt 5 184033706,109 ± 31943851,553 ops/s
Comparison.fastCompletionStageWithoutLambda thrpt 5 171628984,800 ± 2063217,863 ops/s
I insist on the good fit of CompletionStage (or any reactive compatible API closer to java 9 maybe) but I had to migrate from a synchronous code to an async one on friday and the migration was not technically hard and brought a lot of benefit since now it can work in any environment (synchronous using toCompletionFuture().get() or asynchronous like akka actors bridging scala future and CompletionStage). For a portable API (I'm not speaking of the beam - language - portable API which is on top of runner from a design point of view) but of the API any runner must integrate with. Integrated with IO (which is the only part giving sense to any pipeline when you think about it) you can scala way more reliable and efficiently optimizing your resources so it would be an awesome fit for a solution like beam IMHO.