guidelines for setting parallelism in operations/job?
I'm trying to get some simple rules or guidelines for what values to set for
operator or job
parallelism. It would seem to me that it should be a number <= the number of
For example, suppose I have 2 task manager machines, each with 4 task slots.
Assuming no other jobs running on the cluster, would I set the parallelism
like filter and map to 8? If not, what would be a reasonable number?
What happens if you request more parallelism than they are task slots? In
what happens if I set parallelism to 12 on the operations? I'm assuming it
would just use as many
as are available?
Also, it would seem that you would not want to hardcode the parallelism into
your source code, since
you would want to have a rough idea of available task slots when you submit
Should you set parallelism to all operators roughly the same or different
values, and what would guide
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/