osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Consuming data from dynamoDB streams to flink


Done!

Thank you :-)

2018-07-31 6:41 GMT+02:00 Ying Xu <yxu@xxxxxxxx>:

> Thanks Fabian and Thomas.
>
> Please assign FLINK-4582 to the following username:
>     *yxu-apache
> <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=yxu-apache>*
>
> If needed I can get a ICLA or CCLA whichever is proper.
>
> *Ying Xu*
> Software Engineer
> 510.368.1252 <+15103681252>
> [image: Lyft] <http://www.lyft.com/>
>
> On Mon, Jul 30, 2018 at 8:31 PM, Thomas Weise <thw@xxxxxxxxxx> wrote:
>
> > The user is yxu-lyft, Ying had commented on that JIRA as well.
> >
> > https://issues.apache.org/jira/browse/FLINK-4582
> >
> >
> > On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <fhueske@xxxxxxxxx> wrote:
> >
> > > Hi Ying,
> > >
> > > Thanks for considering to contribute the connector!
> > >
> > > In general, you don't need special permissions to contribute to Flink.
> > > Anybody can open Jiras and PRs.
> > > You only need to be assigned to the Contributor role in Jira to be able
> > to
> > > assign an issue to you.
> > > I can give you these permissions if you tell me your Jira user.
> > >
> > > It would also be good if you could submit a CLA [1] if you plan to
> > > contribute a larger feature.
> > >
> > > Thanks, Fabian
> > >
> > > [1] https://www.apache.org/licenses/#clas
> > >
> > >
> > > 2018-07-30 10:07 GMT+02:00 Ying Xu <yxu@xxxxxxxx>:
> > >
> > > > Hello Flink dev:
> > > >
> > > > We have implemented the prototype design and the initial PoC worked
> > > pretty
> > > > well.  Currently, we plan to move ahead with this design in our
> > internal
> > > > production system.
> > > >
> > > > We are thinking of contributing this connector back to the flink
> > > community
> > > > sometime soon.  May I request to be granted with a contributor role?
> > > >
> > > > Many thanks in advance.
> > > >
> > > > *Ying Xu*
> > > > Software Engineer
> > > > 510.368.1252 <+15103681252>
> > > > [image: Lyft] <http://www.lyft.com/>
> > > >
> > > > On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <yxu@xxxxxxxx> wrote:
> > > >
> > > > > Hi Gordon:
> > > > >
> > > > > Cool. Thanks for the thumb-up!
> > > > >
> > > > > We will include some test cases around the behavior of re-sharding.
> > If
> > > > > needed we can double check the behavior with AWS, and see if
> > additional
> > > > > changes are needed.  Will keep you posted.
> > > > >
> > > > > -
> > > > > Ying
> > > > >
> > > > > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <
> > > tzulitai@xxxxxxxxxx
> > > > >
> > > > > wrote:
> > > > >
> > > > >> Hi Ying,
> > > > >>
> > > > >> Sorry for the late reply here.
> > > > >>
> > > > >> From the looks of the AmazonDynamoDBStreamsClient, yes it seems
> like
> > > > this
> > > > >> should simply work.
> > > > >>
> > > > >> Regarding the resharding behaviour I mentioned in the JIRA:
> > > > >> I'm not sure if this is really a difference in behaviour.
> > Internally,
> > > if
> > > > >> DynamoDB streams is actually just working on Kinesis Streams, then
> > the
> > > > >> resharding primitives should be similar.
> > > > >> The shard discovery logic of the Flink Kinesis Consumer assumes
> that
> > > > >> splitting / merging shards will result in new shards of
> increasing,
> > > > >> consecutive shard ids. As long as this is also the behaviour for
> > > > DynamoDB
> > > > >> resharding, then we should be fine.
> > > > >>
> > > > >> Feel free to start with the implementation for this, I think
> > > design-wise
> > > > >> we're good to go. And thanks for working on this!
> > > > >>
> > > > >> Cheers,
> > > > >> Gordon
> > > > >>
> > > > >> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <yxu@xxxxxxxx> wrote:
> > > > >>
> > > > >> > HI Gordon:
> > > > >> >
> > > > >> > We are starting to implement some of the primitives along this
> > path.
> > > > >> Please
> > > > >> > let us know if you have any suggestions.
> > > > >> >
> > > > >> > Thanks!
> > > > >> >
> > > > >> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <yxu@xxxxxxxx> wrote:
> > > > >> >
> > > > >> > > Hi Gordon:
> > > > >> > >
> > > > >> > > Really appreciate the reply.
> > > > >> > >
> > > > >> > > Yes our plan is to build the connector on top of the
> > > > >> > FlinkKinesisConsumer.
> > > > >> > > At the high level, FlinkKinesisConsumer mainly interacts with
> > > > Kinesis
> > > > >> > > through the AmazonKinesis client, more specifically through
> the
> > > > >> following
> > > > >> > > three function calls:
> > > > >> > >
> > > > >> > >    - describeStream
> > > > >> > >    - getRecords
> > > > >> > >    - getShardIterator
> > > > >> > >
> > > > >> > > Given that the low-level DynamoDB client
> > > > (AmazonDynamoDBStreamsClient)
> > > > >> > > has already implemented similar calls, it is possible to use
> > that
> > > > >> client
> > > > >> > to
> > > > >> > > interact with the dynamoDB streams, and adapt the results from
> > the
> > > > >> > dynamoDB
> > > > >> > > streams model to the kinesis model.
> > > > >> > >
> > > > >> > > It appears this is exactly what the
> > AmazonDynamoDBStreamsAdapterCl
> > > > >> ient
> > > > >> > > <
> > > > >> > https://github.com/awslabs/dynamodb-streams-kinesis-adapter/
> > > > >> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/
> > > > >> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
> > > > >> > >
> > > > >> > > does. The adaptor client implements the AmazonKinesis client
> > > > >> interface,
> > > > >> > > and is officially supported by AWS.  Hence it is possible to
> > > replace
> > > > >> the
> > > > >> > > internal Kinesis client inside FlinkKinesisConsumer with this
> > > > adapter
> > > > >> > > client when interacting with dynamoDB streams.  The new object
> > can
> > > > be
> > > > >> a
> > > > >> > > subclass of FlinkKinesisConsumer with a new name e.g,
> > > > >> > FlinkDynamoStreamCon
> > > > >> > > sumer.
> > > > >> > >
> > > > >> > > At best this could simply work. But we would like to hear if
> > there
> > > > are
> > > > >> > > other situations to take care of.  In particular, I am
> wondering
> > > > >> what's
> > > > >> > the *"resharding
> > > > >> > > behavior"* mentioned in FLINK-4582.
> > > > >> > >
> > > > >> > > Thanks a lot!
> > > > >> > >
> > > > >> > > -
> > > > >> > > Ying
> > > > >> > >
> > > > >> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <
> > > > >> > tzulitai@xxxxxxxxxx
> > > > >> > > > wrote:
> > > > >> > >
> > > > >> > >> Hi!
> > > > >> > >>
> > > > >> > >> I think it would be definitely nice to have this feature.
> > > > >> > >>
> > > > >> > >> No actual previous work has been made on this issue, but
> AFAIK,
> > > we
> > > > >> > should
> > > > >> > >> be able to build this on top of the FlinkKinesisConsumer.
> > > > >> > >> Whether this should live within the Kinesis connector module
> or
> > > an
> > > > >> > >> independent module of its own is still TBD.
> > > > >> > >> If you want, I would be happy to look at any concrete design
> > > > >> proposals
> > > > >> > you
> > > > >> > >> have for this before you start the actual development
> efforts.
> > > > >> > >>
> > > > >> > >> Cheers,
> > > > >> > >> Gordon
> > > > >> > >>
> > > > >> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <yxu@xxxxxxxx>
> wrote:
> > > > >> > >>
> > > > >> > >> > Thanks Fabian for the suggestion.
> > > > >> > >> >
> > > > >> > >> > *Ying Xu*
> > > > >> > >> > Software Engineer
> > > > >> > >> > 510.368.1252 <+15103681252>
> > > > >> > >> > [image: Lyft] <http://www.lyft.com/>
> > > > >> > >> >
> > > > >> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <
> > > > fhueske@xxxxxxxxx>
> > > > >> > >> wrote:
> > > > >> > >> >
> > > > >> > >> > > Hi Ying,
> > > > >> > >> > >
> > > > >> > >> > > I'm not aware of any effort for this issue.
> > > > >> > >> > > You could check with the assigned contributor in Jira if
> > > there
> > > > is
> > > > >> > some
> > > > >> > >> > > previous work.
> > > > >> > >> > >
> > > > >> > >> > > Best, Fabian
> > > > >> > >> > >
> > > > >> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <yxu@xxxxxxxx>:
> > > > >> > >> > >
> > > > >> > >> > > > Hello Flink dev:
> > > > >> > >> > > >
> > > > >> > >> > > > We have a number of use cases which involves pulling
> data
> > > > from
> > > > >> > >> DynamoDB
> > > > >> > >> > > > streams into Flink.
> > > > >> > >> > > >
> > > > >> > >> > > > Given that this issue is tracked by Flink-4582
> > > > >> > >> > > > <https://issues.apache.org/jira/browse/FLINK-4582>. we
> > > would
> > > > >> like
> > > > >> > >> to
> > > > >> > >> > > check
> > > > >> > >> > > > if any prior work has been completed by the community.
> >  We
> > > > are
> > > > >> > also
> > > > >> > >> > very
> > > > >> > >> > > > interested in contributing to this effort.  Currently,
> we
> > > > have
> > > > >> a
> > > > >> > >> > > high-level
> > > > >> > >> > > > proposal which is based on extending the existing
> > > > >> > >> FlinkKinesisConsumer
> > > > >> > >> > > and
> > > > >> > >> > > > making it work with DynamoDB streams (via integrating
> > with
> > > > the
> > > > >> > >> > > > AmazonDynamoDBStreams API).
> > > > >> > >> > > >
> > > > >> > >> > > > Any suggestion is welcome. Thank you very much.
> > > > >> > >> > > >
> > > > >> > >> > > >
> > > > >> > >> > > > -
> > > > >> > >> > > > Ying
> > > > >> > >> > > >
> > > > >> > >> > >
> > > > >> > >> >
> > > > >> > >>
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>