Thanks Piotr for your response.
I've further investigated the issue and found the root cause.
There are 2 possible ways to produce/consume records to/from Kinesis:
- Using the Kinesis Data Streams service API directly
- Using the KCL & KPL.
The FlinkKinesisProducer uses the AWS KPL to push records into Kinesis, for optimized performance. One of the features of the KPL is Aggregation, meaning that it batches many UserRecords into one Kinesis Record to increase producer throughput.
The thing is, that consumers of that stream needs to be aware that the records being consumed are aggregated and handle it accordingly .
In my case, the output stream is being consumed by Druid
. So the consumer code is not in my control...
So my choices are to disable the Aggregation feature by passing aggregationEnable: false in the kinesis configuration or writing my own custom consumer for Druid.
I think that we should state this as part of the documentation for Flink Kinesis Connector.