I am trying to implement a connected components algorithm using DataStream. For this algorithm, I'm separating the data by tumbling windows. So, for each window, I'm trying to compute it independently.
This algorithm is iterative because the labels (colors) of the vertices need to be propagated. Basically, I need to iterate over the following steps:
Input: vertices = Datastream of <VertexId, [list of neighbor vertices], label>
labels = vertices.flatmap (emiting a tupple <VertexID, label> for every vertices.f0 and every element on vertices.f1)
updatedVertices = vertices. join(labels).where(VertexId).equalTo(VertexId)
.apply(re-emit original vertices stream tuples, but keeping the new labels)
I am trying to use IterativeStreams to do so. However, despite successfully separating the tuples that need to be fed back to the loop (by using filters and closeWith), the subsequent iterations are not happening. So, what I get is only the first iteration.
I suppose this might come from the fact that I'm creating a new stream (labels) based on the original IterativeStream, joining it with the original one (vertices) and only then closing the loop with it.
Do you know whether Flink has some limitation in this respect? and if so, would you have a hint about a different approach I could take for this algorithm to avoid this?
thank you in advance,