OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Split one dataset into multiple


Hi Vino,

Thank you for suggestions. In my case I am using DataSet since data is limited, and split/select is not available on DataSet api.
I doubt even hash partition might not work for me. By doing hash partition, I do not know which partition is having which entity data (Dept, Emp in my example. And sometimes hasing might be same for 2 different entities). And on that partition I need to apply some other transformations(based on partition data) which is not possible using MapPartitionFunction.

Please suggest if my understanding is wrong and usecase is achievable (little example is of great help).

Thank you,
Madan

On Tue, Nov 6, 2018 at 12:03 PM vino yang <yanghua1127@xxxxxxxxx> wrote:
Hi madan,

I think you need to hash partition your records. 
Flink supports hash partitioning of data. 
The operator is keyBy. 
If the value of your tag field is enumerable, you can also use split/select to achieve your purpose.

Thanks, vino.

madan <madan.yellanki@xxxxxxxxx> 于2018年11月5日周一 下午6:37写道:
Hi,

I have a custom iterator which gives data of multitple entities. For example iterator gives data of Department, Employee and Address. Record's entity type is identified by a field value. And I need to apply different set of operations on each dataset. Ex., Department data may have aggregations, Employee and Address data are simply joined together after some filteration. 

If I have different datasets for each entity type the job is easy. So I am trying to split incoming data to different datasets. What is the best possible way to achieve this ?

*Iterator can be read only once.


--
Thank you,
Madan.


--
Thank you,
Madan.