[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PubSubIO throughput/parallelism options?

On Fri, Jun 22, 2018 at 3:06 PM Lukasz Cwik <lcwik@xxxxxxxxxx> wrote:
Are you sure you have scaled your pipeline up based upon the amount of traffic your receiving?
What do you see your average CPU utilization?
Do you have any functions which perform blocking calls?

Agreed. Need to look into why the job is not processing as much input as it should.

There is no such setting in PubsubIO. In Dataflow, the messages are pulled by streaming worker ('shuffler') using multiple threads. These are not pulled by Java worker running user code.


On Fri, Jun 22, 2018 at 4:26 AM Hrish <talonx@xxxxxxxxx> wrote:
I am using PubSubIO to read from Google PubSub as an unbounded source. I've noticed that the rate at which is receives messages is pretty slow compared to the rate at which messages are being pushed.

In the Google PubSub APIs, I can tweak a pull subscriber's settings using Subscriber.defaultBuilder().setExecutorProvider, where I can set the core pool size for the thread pool that will pull messages.

Is there any equivalent way of tuning the PubSubIO class as well? Or any other way of achieving a higher pull rate?