[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

guidelines for setting parallelism in operations/job?

I'm trying to get some simple rules or guidelines for what values to set for
operator or job
parallelism. It would seem to me that it should be a number <= the number of
available task

For example, suppose I have 2 task manager machines, each with 4 task slots.
Assuming no other jobs running on the cluster, would I set the parallelism
for operations
like filter and map to 8? If not, what would be a reasonable number?

What happens if you request more parallelism than they are task slots? In
example above,
what happens if I set parallelism to 12 on the operations? I'm assuming it
would just use as many
as are available?

Also, it would seem that you would not want to hardcode the parallelism into 
your source code, since
you would want to have a rough idea of available task slots when  you submit
the job? 
Should you set parallelism to all operators roughly the same or different
values, and what would guide
that decision?


Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/