Perform aggregations across multiple windows
We are currently working on implementing a data pipeline using Beam on top of Flink. We have an unbounded data source that sends us some financial positions data. For each account, we perform certain aggregations (let’s assume it’s summation for simplicity) across all products owned by the account. I’m processing the data in windows of 30 seconds.
At any time when I am processing a window I want to be able to access the aggregation from the previous window and add it to the current aggregation. The stateful API in beam stores data by (key,window) and hence I can’t really store a global state that I could access with account being the key.
The only way I could think was to write the data of the previous window to some DB and then read it in my following window which I don’t think is efficient because I have to wait until the previous window is written to the DB. Do you have suggestions on how to approach this problem?