[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Flink restart strategy on specific exception



Looking at existing restart strategies they are kind of generic. We have a requirement to restart the job only in case of specific exception/issues.

What would be the best way to have a re start strategy which is based on few rules like looking at particular type of exception or some extra condition checks which are application specific.?


Just a background on one specific issue which invoked this requirement is slots not getting released when the job finishes. In our applications, we keep track of jobs submitted with the amount of parallelism allotted to it.  Once the job finishes we assume that the slots are free and try to submit next set of jobs which at times fail with error  “not enough slots available”.


So we think a job re start can solve this issue but we only want to re start only if this particular situation is encountered.


Please let us know If there are better ways to solve this problem other than re start strategy.





Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices