[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[jira] [Created] (FLINK-10298) Batch Job Failover Strategy

JIN SUN created FLINK-10298:

             Summary: Batch Job Failover Strategy
                 Key: FLINK-10298
                 URL: https://issues.apache.org/jira/browse/FLINK-10298
             Project: Flink
          Issue Type: Sub-task
          Components: JobManager
            Reporter: JIN SUN
            Assignee: JIN SUN

The new failover strategy needs to consider handling failures according to different failure types. It orchestrates all the logics we mentioned in this [document|https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit#], we can put the logic in onTaskFailure method of the FailoverStrategy interface, with the logic inline:
public void onTaskFailure(Execution taskExecution, Throwable cause) {  

        //1. Get the throwable type

        //2. If the type is NonrecoverableType fail the job

        //3. If the type is PatritionDataMissingError, do revocation

        //4. If the type is EnvironmentError, do check blacklist

//5. Other failure types are recoverable, but we need to remember the count of the failure,

if it exceeds the threshold, fail the job


This message was sent by Atlassian JIRA