Flink InputFormats generate their InputSplits sequentially on the JobManager.
These splits are stored in the heap of the JM process and handed out to SourceTasks when they request them lazily.
Split assignment is done by a InputSplitAssigner, that can be customized. FileInputFormats typically use a LocatableInputSplitAssigner which tries to assign splits based on locality.
I see three potential problems:
1) InputSplit generation might take a long while. The JM is blocked until splits are generated.
2) All InputSplits need to be stored on the JM heap. You might need to assign more memory to the JM process.
3) Split assignment might take a while depending on the complexity of the InputSplitAssigner. You can implement a custom assigner to make this more efficient (from an assignment point of view).