[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to implement repartition.(Internet mail)

Reshuffle is deprecated for requiring stable input (currently being added as a separate transform), but is perfectly fine for just "reshuffling." There is currently no way to set the partition number though, how important is that? 

On Wed, Sep 12, 2018 at 6:46 AM devinduan(段丁瑞) <devinduan@xxxxxxxxxxx> wrote:
  Thanks for your reply.
    But Reshuffle Class has no param to set.    
    I see the code of Reshuffle ,  constructor for this class is private, and code comment  "For internal use only; no backwards compatibility guarantees"
    I mean... I want to set rdd partition number like rdd.repartition(3)  or simliar to Flink DataStream.setParallelism(3) .
    Could you help me...

Date: 2018-09-11 21:50
Subject: Re: How to implement repartition.(Internet mail)
Does Reshuffle do what you want?

On Tue, Sep 11, 2018, 3:46 PM devinduan(段丁瑞) <devinduan@xxxxxxxxxxx> wrote:
Hi all:
    I recently start studying the Beam on spark runner.
    I want to implement a method repartition similar to Spark rdd.repartition() , but I can't find a solution.
    Could anyone help me?
    Thanks for your reply.

JPEG image

JPEG image