[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Routing events by key

Hi Niels,

as you have an Unbounded PCollection, you need a Window to GroupByKey
and then "forward" the data.

Another option would be to use a DoFn working element per element and
eventually batching then. It's what most of the IOs are doing for the
Write part.


On 06/07/2018 17:01, Niels Basjes wrote:
> Hi,
> I have an unbounded stream of change events each of which has the id of
> the entity that is changed.
> To avoid the need for locking in the persistence layer that is needed in
> part of my processing I want to route all events based on this entity id.
> That way I know for sure that all events around a single entity go
> through the same instance of my processing sequentially, hence no need
> for locking or other synchronization regarding this persistence.
> At this point my best guess is that I need to use the GroupByKey but
> that seems to need a Window. 
> I think I don't want a window because the stream is unbounded and I want
> the lowest possible latency (i.e. a Window of 1 second would be ok for
> this usecase).
> Also I want to be 100% sure that all events for a specific id go to only
> a single instance because I do not want any race conditions.
> My simple question is: What does that look like in the Beam Java API?
> -- 
> Best regards / Met vriendelijke groeten,
> Niels Basjes

Jean-Baptiste Onofré
Talend - http://www.talend.com