I’m currently running multiple jobs in the same yarn session and controlling how I submit jobs based off what my calculated parallelism should be. For example, I have 4 TMs, 2 slots, so I should be able run multiple parallelism 1 jobs at the same time up to 8. However, what I’m seeing is that the implicit key extractor that’s going into the cogroup has parallelism 8 instead of having the same parallelism of the cogroup 1. This causes the other jobs to fail with:
jobmanager.scheduler. NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration.
This isn’t an issue when it chains but is one when the key extractor becomes its own operator. Is there a way I can force the key extractor to have the correct parallelism? What’s the best way around this?
Goldman Sachs – Enterprise Platforms, Data Architecture
30 Hudson Street, 37th floor | Jersey City, NY 07302 ( (212) 902-5697