osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Consuming data from dynamoDB streams to flink


Hi Gordon:

Really appreciate the reply.

Yes our plan is to build the connector on top of the FlinkKinesisConsumer.
At the high level, FlinkKinesisConsumer mainly interacts with Kinesis
through the AmazonKinesis client, more specifically through the following
three function calls:

   - describeStream
   - getRecords
   - getShardIterator

Given that the low-level DynamoDB client (AmazonDynamoDBStreamsClient) has
already implemented similar calls, it is possible to use that client to
interact with the dynamoDB streams, and adapt the results from the dynamoDB
streams model to the kinesis model.

It appears this is exactly what the AmazonDynamoDBStreamsAdapterClient
<https://github.com/awslabs/dynamodb-streams-kinesis-adapter/blob/master/src/main/java/com/amazonaws/services/dynamodbv2/streamsadapter/AmazonDynamoDBStreamsAdapterClient.java>
does. The adaptor client implements the AmazonKinesis client interface, and
is officially supported by AWS.  Hence it is possible to replace the
internal Kinesis client inside FlinkKinesisConsumer with this adapter
client when interacting with dynamoDB streams.  The new object can be a
subclass of FlinkKinesisConsumer with a new name e.g,
FlinkDynamoStreamConsumer.

At best this could simply work. But we would like to hear if there are
other situations to take care of.  In particular, I am wondering
what's the *"resharding
behavior"* mentioned in FLINK-4582.

Thanks a lot!

-
Ying

On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <tzulitai@xxxxxxxxxx>
wrote:

> Hi!
>
> I think it would be definitely nice to have this feature.
>
> No actual previous work has been made on this issue, but AFAIK, we should
> be able to build this on top of the FlinkKinesisConsumer.
> Whether this should live within the Kinesis connector module or an
> independent module of its own is still TBD.
> If you want, I would be happy to look at any concrete design proposals you
> have for this before you start the actual development efforts.
>
> Cheers,
> Gordon
>
> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <yxu@xxxxxxxx> wrote:
>
> > Thanks Fabian for the suggestion.
> >
> > *Ying Xu*
> > Software Engineer
> > 510.368.1252 <+15103681252>
> > [image: Lyft] <http://www.lyft.com/>
> >
> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <fhueske@xxxxxxxxx>
> wrote:
> >
> > > Hi Ying,
> > >
> > > I'm not aware of any effort for this issue.
> > > You could check with the assigned contributor in Jira if there is some
> > > previous work.
> > >
> > > Best, Fabian
> > >
> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <yxu@xxxxxxxx>:
> > >
> > > > Hello Flink dev:
> > > >
> > > > We have a number of use cases which involves pulling data from
> DynamoDB
> > > > streams into Flink.
> > > >
> > > > Given that this issue is tracked by Flink-4582
> > > > <https://issues.apache.org/jira/browse/FLINK-4582>. we would like to
> > > check
> > > > if any prior work has been completed by the community.   We are also
> > very
> > > > interested in contributing to this effort.  Currently, we have a
> > > high-level
> > > > proposal which is based on extending the existing
> FlinkKinesisConsumer
> > > and
> > > > making it work with DynamoDB streams (via integrating with the
> > > > AmazonDynamoDBStreams API).
> > > >
> > > > Any suggestion is welcome. Thank you very much.
> > > >
> > > >
> > > > -
> > > > Ying
> > > >
> > >
> >
>