I wanted to follow up on this thread one last time as we found a solution for the recovery time that worked well for us.
Originally, we were running job by using a jar that shaded in all of our dependencies. We switched to a more lightweight jar for the job itself and made the dependency jar an extra element added to the class path. That sped up recovery significantly to around ~1 minute for 250 jobs.
In case anyone else hits this again, this is something they can try.