osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Consuming data from dynamoDB streams to flink


Hi Gordon:

Cool. Thanks for the thumb-up!

We will include some test cases around the behavior of re-sharding. If
needed we can double check the behavior with AWS, and see if additional
changes are needed.  Will keep you posted.

-
Ying

On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <tzulitai@xxxxxxxxxx>
wrote:

> Hi Ying,
>
> Sorry for the late reply here.
>
> From the looks of the AmazonDynamoDBStreamsClient, yes it seems like this
> should simply work.
>
> Regarding the resharding behaviour I mentioned in the JIRA:
> I'm not sure if this is really a difference in behaviour. Internally, if
> DynamoDB streams is actually just working on Kinesis Streams, then the
> resharding primitives should be similar.
> The shard discovery logic of the Flink Kinesis Consumer assumes that
> splitting / merging shards will result in new shards of increasing,
> consecutive shard ids. As long as this is also the behaviour for DynamoDB
> resharding, then we should be fine.
>
> Feel free to start with the implementation for this, I think design-wise
> we're good to go. And thanks for working on this!
>
> Cheers,
> Gordon
>
> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <yxu@xxxxxxxx> wrote:
>
> > HI Gordon:
> >
> > We are starting to implement some of the primitives along this path.
> Please
> > let us know if you have any suggestions.
> >
> > Thanks!
> >
> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <yxu@xxxxxxxx> wrote:
> >
> > > Hi Gordon:
> > >
> > > Really appreciate the reply.
> > >
> > > Yes our plan is to build the connector on top of the
> > FlinkKinesisConsumer.
> > > At the high level, FlinkKinesisConsumer mainly interacts with Kinesis
> > > through the AmazonKinesis client, more specifically through the
> following
> > > three function calls:
> > >
> > >    - describeStream
> > >    - getRecords
> > >    - getShardIterator
> > >
> > > Given that the low-level DynamoDB client (AmazonDynamoDBStreamsClient)
> > > has already implemented similar calls, it is possible to use that
> client
> > to
> > > interact with the dynamoDB streams, and adapt the results from the
> > dynamoDB
> > > streams model to the kinesis model.
> > >
> > > It appears this is exactly what the AmazonDynamoDBStreamsAdapterClient
> > > <
> > https://github.com/awslabs/dynamodb-streams-kinesis-
> adapter/blob/master/src/main/java/com/amazonaws/services/
> dynamodbv2/streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
> > >
> > > does. The adaptor client implements the AmazonKinesis client interface,
> > > and is officially supported by AWS.  Hence it is possible to replace
> the
> > > internal Kinesis client inside FlinkKinesisConsumer with this adapter
> > > client when interacting with dynamoDB streams.  The new object can be a
> > > subclass of FlinkKinesisConsumer with a new name e.g,
> > FlinkDynamoStreamCon
> > > sumer.
> > >
> > > At best this could simply work. But we would like to hear if there are
> > > other situations to take care of.  In particular, I am wondering what's
> > the *"resharding
> > > behavior"* mentioned in FLINK-4582.
> > >
> > > Thanks a lot!
> > >
> > > -
> > > Ying
> > >
> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <
> > tzulitai@xxxxxxxxxx
> > > > wrote:
> > >
> > >> Hi!
> > >>
> > >> I think it would be definitely nice to have this feature.
> > >>
> > >> No actual previous work has been made on this issue, but AFAIK, we
> > should
> > >> be able to build this on top of the FlinkKinesisConsumer.
> > >> Whether this should live within the Kinesis connector module or an
> > >> independent module of its own is still TBD.
> > >> If you want, I would be happy to look at any concrete design proposals
> > you
> > >> have for this before you start the actual development efforts.
> > >>
> > >> Cheers,
> > >> Gordon
> > >>
> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <yxu@xxxxxxxx> wrote:
> > >>
> > >> > Thanks Fabian for the suggestion.
> > >> >
> > >> > *Ying Xu*
> > >> > Software Engineer
> > >> > 510.368.1252 <+15103681252>
> > >> > [image: Lyft] <http://www.lyft.com/>
> > >> >
> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <fhueske@xxxxxxxxx>
> > >> wrote:
> > >> >
> > >> > > Hi Ying,
> > >> > >
> > >> > > I'm not aware of any effort for this issue.
> > >> > > You could check with the assigned contributor in Jira if there is
> > some
> > >> > > previous work.
> > >> > >
> > >> > > Best, Fabian
> > >> > >
> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <yxu@xxxxxxxx>:
> > >> > >
> > >> > > > Hello Flink dev:
> > >> > > >
> > >> > > > We have a number of use cases which involves pulling data from
> > >> DynamoDB
> > >> > > > streams into Flink.
> > >> > > >
> > >> > > > Given that this issue is tracked by Flink-4582
> > >> > > > <https://issues.apache.org/jira/browse/FLINK-4582>. we would
> like
> > >> to
> > >> > > check
> > >> > > > if any prior work has been completed by the community.   We are
> > also
> > >> > very
> > >> > > > interested in contributing to this effort.  Currently, we have a
> > >> > > high-level
> > >> > > > proposal which is based on extending the existing
> > >> FlinkKinesisConsumer
> > >> > > and
> > >> > > > making it work with DynamoDB streams (via integrating with the
> > >> > > > AmazonDynamoDBStreams API).
> > >> > > >
> > >> > > > Any suggestion is welcome. Thank you very much.
> > >> > > >
> > >> > > >
> > >> > > > -
> > >> > > > Ying
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>