osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Consuming data from dynamoDB streams to flink


The user is yxu-lyft, Ying had commented on that JIRA as well.

https://issues.apache.org/jira/browse/FLINK-4582


On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <fhueske@xxxxxxxxx> wrote:

> Hi Ying,
>
> Thanks for considering to contribute the connector!
>
> In general, you don't need special permissions to contribute to Flink.
> Anybody can open Jiras and PRs.
> You only need to be assigned to the Contributor role in Jira to be able to
> assign an issue to you.
> I can give you these permissions if you tell me your Jira user.
>
> It would also be good if you could submit a CLA [1] if you plan to
> contribute a larger feature.
>
> Thanks, Fabian
>
> [1] https://www.apache.org/licenses/#clas
>
>
> 2018-07-30 10:07 GMT+02:00 Ying Xu <yxu@xxxxxxxx>:
>
> > Hello Flink dev:
> >
> > We have implemented the prototype design and the initial PoC worked
> pretty
> > well.  Currently, we plan to move ahead with this design in our internal
> > production system.
> >
> > We are thinking of contributing this connector back to the flink
> community
> > sometime soon.  May I request to be granted with a contributor role?
> >
> > Many thanks in advance.
> >
> > *Ying Xu*
> > Software Engineer
> > 510.368.1252 <+15103681252>
> > [image: Lyft] <http://www.lyft.com/>
> >
> > On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <yxu@xxxxxxxx> wrote:
> >
> > > Hi Gordon:
> > >
> > > Cool. Thanks for the thumb-up!
> > >
> > > We will include some test cases around the behavior of re-sharding. If
> > > needed we can double check the behavior with AWS, and see if additional
> > > changes are needed.  Will keep you posted.
> > >
> > > -
> > > Ying
> > >
> > > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <
> tzulitai@xxxxxxxxxx
> > >
> > > wrote:
> > >
> > >> Hi Ying,
> > >>
> > >> Sorry for the late reply here.
> > >>
> > >> From the looks of the AmazonDynamoDBStreamsClient, yes it seems like
> > this
> > >> should simply work.
> > >>
> > >> Regarding the resharding behaviour I mentioned in the JIRA:
> > >> I'm not sure if this is really a difference in behaviour. Internally,
> if
> > >> DynamoDB streams is actually just working on Kinesis Streams, then the
> > >> resharding primitives should be similar.
> > >> The shard discovery logic of the Flink Kinesis Consumer assumes that
> > >> splitting / merging shards will result in new shards of increasing,
> > >> consecutive shard ids. As long as this is also the behaviour for
> > DynamoDB
> > >> resharding, then we should be fine.
> > >>
> > >> Feel free to start with the implementation for this, I think
> design-wise
> > >> we're good to go. And thanks for working on this!
> > >>
> > >> Cheers,
> > >> Gordon
> > >>
> > >> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <yxu@xxxxxxxx> wrote:
> > >>
> > >> > HI Gordon:
> > >> >
> > >> > We are starting to implement some of the primitives along this path.
> > >> Please
> > >> > let us know if you have any suggestions.
> > >> >
> > >> > Thanks!
> > >> >
> > >> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <yxu@xxxxxxxx> wrote:
> > >> >
> > >> > > Hi Gordon:
> > >> > >
> > >> > > Really appreciate the reply.
> > >> > >
> > >> > > Yes our plan is to build the connector on top of the
> > >> > FlinkKinesisConsumer.
> > >> > > At the high level, FlinkKinesisConsumer mainly interacts with
> > Kinesis
> > >> > > through the AmazonKinesis client, more specifically through the
> > >> following
> > >> > > three function calls:
> > >> > >
> > >> > >    - describeStream
> > >> > >    - getRecords
> > >> > >    - getShardIterator
> > >> > >
> > >> > > Given that the low-level DynamoDB client
> > (AmazonDynamoDBStreamsClient)
> > >> > > has already implemented similar calls, it is possible to use that
> > >> client
> > >> > to
> > >> > > interact with the dynamoDB streams, and adapt the results from the
> > >> > dynamoDB
> > >> > > streams model to the kinesis model.
> > >> > >
> > >> > > It appears this is exactly what the AmazonDynamoDBStreamsAdapterCl
> > >> ient
> > >> > > <
> > >> > https://github.com/awslabs/dynamodb-streams-kinesis-adapter/
> > >> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/
> > >> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
> > >> > >
> > >> > > does. The adaptor client implements the AmazonKinesis client
> > >> interface,
> > >> > > and is officially supported by AWS.  Hence it is possible to
> replace
> > >> the
> > >> > > internal Kinesis client inside FlinkKinesisConsumer with this
> > adapter
> > >> > > client when interacting with dynamoDB streams.  The new object can
> > be
> > >> a
> > >> > > subclass of FlinkKinesisConsumer with a new name e.g,
> > >> > FlinkDynamoStreamCon
> > >> > > sumer.
> > >> > >
> > >> > > At best this could simply work. But we would like to hear if there
> > are
> > >> > > other situations to take care of.  In particular, I am wondering
> > >> what's
> > >> > the *"resharding
> > >> > > behavior"* mentioned in FLINK-4582.
> > >> > >
> > >> > > Thanks a lot!
> > >> > >
> > >> > > -
> > >> > > Ying
> > >> > >
> > >> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <
> > >> > tzulitai@xxxxxxxxxx
> > >> > > > wrote:
> > >> > >
> > >> > >> Hi!
> > >> > >>
> > >> > >> I think it would be definitely nice to have this feature.
> > >> > >>
> > >> > >> No actual previous work has been made on this issue, but AFAIK,
> we
> > >> > should
> > >> > >> be able to build this on top of the FlinkKinesisConsumer.
> > >> > >> Whether this should live within the Kinesis connector module or
> an
> > >> > >> independent module of its own is still TBD.
> > >> > >> If you want, I would be happy to look at any concrete design
> > >> proposals
> > >> > you
> > >> > >> have for this before you start the actual development efforts.
> > >> > >>
> > >> > >> Cheers,
> > >> > >> Gordon
> > >> > >>
> > >> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <yxu@xxxxxxxx> wrote:
> > >> > >>
> > >> > >> > Thanks Fabian for the suggestion.
> > >> > >> >
> > >> > >> > *Ying Xu*
> > >> > >> > Software Engineer
> > >> > >> > 510.368.1252 <+15103681252>
> > >> > >> > [image: Lyft] <http://www.lyft.com/>
> > >> > >> >
> > >> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <
> > fhueske@xxxxxxxxx>
> > >> > >> wrote:
> > >> > >> >
> > >> > >> > > Hi Ying,
> > >> > >> > >
> > >> > >> > > I'm not aware of any effort for this issue.
> > >> > >> > > You could check with the assigned contributor in Jira if
> there
> > is
> > >> > some
> > >> > >> > > previous work.
> > >> > >> > >
> > >> > >> > > Best, Fabian
> > >> > >> > >
> > >> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <yxu@xxxxxxxx>:
> > >> > >> > >
> > >> > >> > > > Hello Flink dev:
> > >> > >> > > >
> > >> > >> > > > We have a number of use cases which involves pulling data
> > from
> > >> > >> DynamoDB
> > >> > >> > > > streams into Flink.
> > >> > >> > > >
> > >> > >> > > > Given that this issue is tracked by Flink-4582
> > >> > >> > > > <https://issues.apache.org/jira/browse/FLINK-4582>. we
> would
> > >> like
> > >> > >> to
> > >> > >> > > check
> > >> > >> > > > if any prior work has been completed by the community.   We
> > are
> > >> > also
> > >> > >> > very
> > >> > >> > > > interested in contributing to this effort.  Currently, we
> > have
> > >> a
> > >> > >> > > high-level
> > >> > >> > > > proposal which is based on extending the existing
> > >> > >> FlinkKinesisConsumer
> > >> > >> > > and
> > >> > >> > > > making it work with DynamoDB streams (via integrating with
> > the
> > >> > >> > > > AmazonDynamoDBStreams API).
> > >> > >> > > >
> > >> > >> > > > Any suggestion is welcome. Thank you very much.
> > >> > >> > > >
> > >> > >> > > >
> > >> > >> > > > -
> > >> > >> > > > Ying
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>