osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Please please please review : Re: PubSub connector (FLINK-9311)


Thank you.

On Fri, Nov 2, 2018, 08:37 Chesnay Schepler <chesnay@xxxxxxxxxx wrote:

> Hi!
>
> We're currently in the release process for 1.7.0 with a feature-freeze
> in effect, which locks down most (all?) committers to bug-fixes and
> testing.
>
> I apologize for this PR being ignored for so long, especially so since
> you started a discussion beforehand as to whether the Flink community
> would be interested in having such a connector.
>
> I will nag some people to review it once the release is done or have a
> crack at it myself.
>
> On 02.11.2018 08:23, Niels Basjes wrote:
> > Hi all
> > We created this feature almost 2 months ago.
> > Please review this so it becomes part ofthe release so that others can
> use
> > it too.
> >
> > So far we have not seen any feedback on what we should change/improve to
> > make this happen.
> >
> > What should we do?
> >
> > Niels & Richard
> >
> > On Wed, Oct 17, 2018, 10:52 Richard Deurwaarder <richard@xxxxxxx wrote:
> >
> >> Hello everyone,
> >>
> >> To improve performance we have changed some parts of the code:
> >> * Previously the same threads used to poll PubSub we're used to actually
> >> run the flink pipeline, this caused some issues in the PubSub SDK. We've
> >> changed this so that the PubSub threads are only used to poll PubSub and
> >> the flink pipeline runs in the same thread as the run() method.
> >> * We now expose PubSub options so we are able to tune the PubSub source
> >> functions when the defaults don't give the best performance.
> >>
> >> I did run into one issue that we are currently unable to fix outside of
> >> flink when rescaling the job to a lower parallelism:
> >> We use the MessageAcknowledgingSourceBase and this had been designed for
> >> RabbitMQ to run with a parallelism of 1. For PubSub we would like a
> higher
> >> parallelism and fallback to ATLEAST_ONCE.
> >> When rescaling back to a lower parallelism we need to combine state of 2
> >> tasks but the MessageAcknowledingSourceBase explicitly does not support
> >> this in the initializeState() function.
> >> I've added this commit:
> >>
> >>
> https://github.com/Xeli/flink/commit/698b6f1c802427f940f2f550796539fbfa4b5dfa
> >> to
> >> fix this, this should not change anything for RabbitMQ but does allow
> >> rescaling when needed.
> >>
> >>
> >> That last issue is not possible for us to fix outside of flink,
> therefore
> >> we would really like to see this added to 1.7 :)
> >> If it remains difficult to find Google Cloud Platform users to review
> this,
> >> would it be an option to add the PubSub connector as a 'beta' feature?
> >> Perhaps this would make it more visible to the community?
> >>
> >> Best,
> >>
> >> Richard
> >>
> >> On Fri, Sep 14, 2018 at 4:01 PM Aljoscha Krettek <aljoscha@xxxxxxxxxx>
> >> wrote:
> >>
> >>> Hi Niels and Richard,
> >>>
> >>> I would be very happy about having a PubSub connector in Flink. Having
> it
> >>> in Flink means that you don't have manual effort for tracking API
> changes
> >>> and I think having a production user is incentive enough for them (you)
> >> to
> >>> maintain the connector.
> >>>
> >>> I'm afraid we don't have much PubSub knowledge in the Flink community
> but
> >>> I will try and talk to some folks to have this reviewed.
> >>>
> >>> Best,
> >>> Aljoscha
> >>>
> >>>> On 14. Sep 2018, at 15:40, Niels Basjes <Niels@xxxxxxxxx> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> We (Richard and I) would really appreciate it if you guys could review
> >>>> the new feature we created (see below).
> >>>> It is something we really need in production and thought it would be
> >>>> best if it can be a native part of the Flink toolset.
> >>>> Please indicate what we need to change/improve in order to get this
> >>> committed.
> >>>> Thank you.
> >>>>
> >>>> Niels Basjes
> >>>>
> >>>> On Mon, Sep 10, 2018 at 10:12 AM Richard Deurwaarder <richard@xxxxxxx
> >
> >>> wrote:
> >>>>> Hello everyone,
> >>>>>
> >>>>> A while back I opened this Jira issue:
> >>>>> https://issues.apache.org/jira/browse/FLINK-9311. It is regarding a
> >> new
> >>>>> Flink connector for Google PubSub.
> >>>>>
> >>>>> At Bol.com, the company I work for, we are going to use this
> connector
> >>> to
> >>>>> do our financial processing. My colleague, Niels Basjes, and I have
> >>>>> finished implementing this and we are about ready to start running it
> >> in
> >>>>> production.
> >>>>>
> >>>>> We would like to donate this code and have opened a pull request (
> >>>>> https://github.com/apache/flink/pull/6594). The pull request
> >> contains:
> >>>>>    - a SourceFunction (with 2 test versions)
> >>>>>    - a SinkFunction
> >>>>>    - an example application
> >>>>>    - and End-to-End tests using a docker container
> >>>>>
> >>>>> Yanghua has taken a glance at the code and already provided us with
> >> some
> >>>>> feedback. We would like to invite others to provide additional
> >> feedback.
> >>>>> I hope you find this useful and will consider merging the PR!
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Richard Deurwaarder
> >>>>
> >>>>
> >>>> --
> >>>> Best regards / Met vriendelijke groeten,
> >>>>
> >>>> Niels Basjes
> >>>
>
>