Right now it is a Kafka source, but I had the same issue when reading data from local FS.
It looks like a common problem for many (all?) sources.
When incoming data is very small (paths to large archives) but each entry requires a significant time to process (unpack, parse, etc.) Flink detects the back pressure with delay and too much data becomes part of the first transaction.