Thanks for the commit comment
. I'll try to dump any (old) mental context I have left..
We were trying to find the right point in a space of:
* enough parallelism to speed things up
- more than 256 didn't seem to help with perf, and 256 was not close to looking like a DDOS to GCS, so we were not in danger of quota limits being imposed.
* enough batches to speed things up
- batches -> fewer RPCs from the job to GCS, more RPCs inside GCS. 100 was again good for perf and good for quotas.
* small enough batches to spread the load
- if you think about the multiple layers of fanout in RPC handling, sending a batch RPC cuts out the first layer of load-balancing. The endpoint that receives the batch itself handles all the individual requests, and if that endpoint is slow or any individual request in the batch is slow, the entire batch is slow. Prefer many batches in flight, so as to not be limited by the performance of that single endpoint.
Coming to the question in the PR comment:
> Any reason not to use this.executorService?
I think the main reason to not use `this.executor` is that we didn't have constraints on that executor in terms of either upper or lower bounds on parallelism, so it seemed safer to use our own with known limits.
Thanks for catching the memory leak though - we didn't at the time! I'll defer to you (and especially Luke ;) on a good solution to fix.