osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Update state after firing


@Reuven: thanks for letting me know. I thought that's expected. We ran into this issue when we try to use the Stateful ParDo to process events from session-windowed inputs. As a walk-around, we ended up reassigning global window to these events and use our backend RocksDb state TTL to retire old data.

Thanks,
Xinyu

On Tue, Oct 9, 2018 at 11:54 AM Reuven Lax <relax@xxxxxxxxxx> wrote:
2) is simply a bug that nobody has ever gotten around to fixing. Stateful ParDo should support merging windows such as sessions.

On Tue, Oct 9, 2018 at 11:40 AM Xinyu Liu <xinyuliu.us@xxxxxxxxx> wrote:
We do use stateful ParDo in the same job for a different use case (and we did read through Kenn's blogs :) ). Here are the reasons why we prefer using aggregation:

1) It's much convenient for the user to define the window and trigger and have the Combine on top of it. It's not very clear how early firing works in Stateful Pardo, and it does seem to require more user effort to set up the states/timers.

2) It seems Stateful ParDo doesn't support non-emergent windows, e.g. session window. This is actually one of our use case.

3) It seems quite general and more flexible to our users to allow updating state after firing. I don't want to tell our further users to stay with from Combine for this and they have to handle the state explicitly.

Thanks,
Xinyu



On Tue, Oct 9, 2018 at 11:27 AM Rui Wang <ruwang@xxxxxxxxxx> wrote:
Hi Xinyu,

There are two nice articles on Beam website about stateful processing that you may want to check out:


-Rui

On Tue, Oct 9, 2018 at 11:07 AM Reuven Lax <relax@xxxxxxxxxx> wrote:
Have you considered using Beam's state API for this?

On Tue, Oct 9, 2018 at 11:03 AM Xinyu Liu <xinyuliu.us@xxxxxxxxx> wrote:
Hi, guys,

Current triggering allows us to either discard the state or accumulate the state after a window pane is fired. We use the extractOutput() in CombinFn to return the output value after the firing. All these have been working well for us. We do have a use case which seems not handled here: we would like to update the state after the firing. Let me illustrate this use case by an example: we have a 10-min fixed window with repeatedly early trigger of 1 min over an input stream which contains events of user id and page id. The accumulator for the window has two parts: 1) set of page ids already seen; 2) set of user ids who first views a page in this window (this is done by looking up #1). For each early firing, we want to output #2, and clear the second part of the state. But we would like to keep the #1 around for later calculations in this window. This example might be too simple to make sense, but it comes from one of our real use cases which is needed for some anti-abuse scenarios.

To address this use case, is it OK to add a AccumT updateAfterFiring(AccumT accumulator) in current CombinFn? That way the user can choose to update the state partially if needed, e.g. for our use case. Any feedback is very welcome. 

Thanks,
Xinyu