[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[jira] [Created] (FLINK-10212) REST API for listing all the available save points

Marc Rooding created FLINK-10212:

             Summary: REST API for listing all the available save points
                 Key: FLINK-10212
                 URL: https://issues.apache.org/jira/browse/FLINK-10212
             Project: Flink
          Issue Type: New Feature
            Reporter: Marc Rooding


I'm one of the authors of the open-source Flink job deployer ([https://github.com/ing-bank/flink-deployer)]. Recently, I rewrote our implementation to use the Flink REST API instead of the native CLI. 

In our use case, we store the job savepoints in a Kubernetes persistent volume. For our deployer, we mount the persistent volume to our deployer container so that we can find and use the savepoints. 

In the rewrite to the REST API, I saw that the API to monitor savepoint creation returns the complete path to the created savepoint, and we can use this one in the job deployer to start the new job with the latest save point.

However, we also allow users to deploy a job with a recovered state by specifying only the directory savepoints are stored in. In this scenario we will look for the latest savepoint created for this job ourselves inside the given directory. To find this path, we're still relying on the mounted volume and listing directory content to discover savepoints.


I was thinking that it might be a good addition if the native Flink REST API offers the ability to retrieve savepoints. Seeing that the API doesn't inherently know where savepoints are stored, it could take a path as one of the arguments. It could even allow the user to provide a job ID as an argument so that the API would be able to search for savepoints for a specific job ID in the specified directory.


As the API would require the path as an argument, and providing a path containing forward slashes in the URL isn't ideal, I'm eager to discuss what a proper solution would look like.

A POST request to /jobs/:jobid/savepoints with the path as a body parameter would make sense if the API were to offer to list all save points in a specific path but this request is already being used for creating new savepoints.

An alternative could be a POST to /savepoints with the path and job ID in the request body.

A POST request to retrieve data is obviously not the most straightforward approach but in my opinion still preferable over a GET to, for example, /jobs/:jobid/savepoints/:targetDirectory

I'm willing to help out on this one by submitting a pull request.

Looking forward to your thoughts! 

This message was sent by Atlassian JIRA