[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Update state after firing

Hi Xinyu,

There are two nice articles on Beam website about stateful processing that you may want to check out:


On Tue, Oct 9, 2018 at 11:07 AM Reuven Lax <relax@xxxxxxxxxx> wrote:
Have you considered using Beam's state API for this?

On Tue, Oct 9, 2018 at 11:03 AM Xinyu Liu <xinyuliu.us@xxxxxxxxx> wrote:
Hi, guys,

Current triggering allows us to either discard the state or accumulate the state after a window pane is fired. We use the extractOutput() in CombinFn to return the output value after the firing. All these have been working well for us. We do have a use case which seems not handled here: we would like to update the state after the firing. Let me illustrate this use case by an example: we have a 10-min fixed window with repeatedly early trigger of 1 min over an input stream which contains events of user id and page id. The accumulator for the window has two parts: 1) set of page ids already seen; 2) set of user ids who first views a page in this window (this is done by looking up #1). For each early firing, we want to output #2, and clear the second part of the state. But we would like to keep the #1 around for later calculations in this window. This example might be too simple to make sense, but it comes from one of our real use cases which is needed for some anti-abuse scenarios.

To address this use case, is it OK to add a AccumT updateAfterFiring(AccumT accumulator) in current CombinFn? That way the user can choose to update the state partially if needed, e.g. for our use case. Any feedback is very welcome.