[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Please review : Re: PubSub connector (FLINK-9311)

Hello everyone,

To improve performance we have changed some parts of the code:
* Previously the same threads used to poll PubSub we're used to actually
run the flink pipeline, this caused some issues in the PubSub SDK. We've
changed this so that the PubSub threads are only used to poll PubSub and
the flink pipeline runs in the same thread as the run() method.
* We now expose PubSub options so we are able to tune the PubSub source
functions when the defaults don't give the best performance.

I did run into one issue that we are currently unable to fix outside of
flink when rescaling the job to a lower parallelism:
We use the MessageAcknowledgingSourceBase and this had been designed for
RabbitMQ to run with a parallelism of 1. For PubSub we would like a higher
parallelism and fallback to ATLEAST_ONCE.
When rescaling back to a lower parallelism we need to combine state of 2
tasks but the MessageAcknowledingSourceBase explicitly does not support
this in the initializeState() function.
I've added this commit:
to fix this, this should not change anything for RabbitMQ but does allow
rescaling when needed.

That last issue is not possible for us to fix outside of flink, therefore
we would really like to see this added to 1.7 :)
If it remains difficult to find Google Cloud Platform users to review this
would it be an option to add the PubSub connector as a 'beta' feature?
Perhaps this would make it more visible to the community?



On Fri, Sep 14, 2018 at 4:01 PM Aljoscha Krettek <aljoscha@xxxxxxxxxx>

> Hi Niels and Richard,
> I would be very happy about having a PubSub connector in Flink. Having it
> in Flink means that you don't have manual effort for tracking API changes
> and I think having a production user is incentive enough for them (you) to
> maintain the connector.
> I'm afraid we don't have much PubSub knowledge in the Flink community but
> I will try and talk to some folks to have this reviewed.
> Best,
> Aljoscha
> > On 14. Sep 2018, at 15:40, Niels Basjes <Niels@xxxxxxxxx> wrote:
> >
> > Hi all,
> >
> > We (Richard and I) would really appreciate it if you guys could review
> > the new feature we created (see below).
> > It is something we really need in production and thought it would be
> > best if it can be a native part of the Flink toolset.
> > Please indicate what we need to change/improve in order to get this
> committed.
> >
> > Thank you.
> >
> > Niels Basjes
> >
> > On Mon, Sep 10, 2018 at 10:12 AM Richard Deurwaarder <richard@xxxxxxx>
> wrote:
> >>
> >> Hello everyone,
> >>
> >> A while back I opened this Jira issue:
> >> https://issues.apache.org/jira/browse/FLINK-9311. It is regarding a new
> >> Flink connector for Google PubSub.
> >>
> >> At Bol.com, the company I work for, we are going to use this connector
> to
> >> do our financial processing. My colleague, Niels Basjes, and I have
> >> finished implementing this and we are about ready to start running it in
> >> production.
> >>
> >> We would like to donate this code and have opened a pull request (
> >> https://github.com/apache/flink/pull/6594). The pull request contains:
> >>
> >>   - a SourceFunction (with 2 test versions)
> >>   - a SinkFunction
> >>   - an example application
> >>   - and End-to-End tests using a docker container
> >>
> >> Yanghua has taken a glance at the code and already provided us with some
> >> feedback. We would like to invite others to provide additional feedback.
> >>
> >> I hope you find this useful and will consider merging the PR!
> >>
> >> Best,
> >>
> >> Richard Deurwaarder
> >
> >
> >
> > --
> > Best regards / Met vriendelijke groeten,
> >
> > Niels Basjes