Thanks for responding Ruoyun,
We are not sure yet who is causing the leak, but once we run out of the memory then sdk worker crashes and pipeline is forced to restart. Check the memory usage patterns in the attached image. Each line in that graph is representing one task manager host.
You are right we are running the models for predictions.
Here are few observations:
1. All the tasks manager memory usage climb over time but some of the task managers' memory climb really fast because they are running the ML models. These models are definitely using memory intensive data structure (pandas data frame etc) hence their memory usage climb really fast.
2. We had almost the same code running in different infrastructure (non-streaming) that doesn't cause any memory issue.
3. Even when the pipeline has restarted, the memory is not released. It is still hogged by something. You can notice in the attached image that pipeline restarted around 13:30. At that time it is definitely released some portion of the memory but didn't completely released all memory. Notice that, when the pipeline was originally started, it started with 30% of the memory but when got restarted by the job manager it started with 60% of the memory.