Re: Hbase state backend in Flink
Very cool! I might be out of dated of what’s new in Flink already…
Just wonder If there are efforts to support seconds level barrier alignment?
> On Dec 27, 2018, at 23:26, Yu Li <carp84@xxxxxxxxx> wrote:
> FWIW, one major advantage of adopting HBase as Flink statebackend is to
> support direct read/write on DFS, so as to disaggregate storage and compute
> (DisAgg). DisAgg has several benefits, such as supporting elastic
> computing in cloud, much better (order of magnitude) recovery speed when
> rescaling up/down (as Gyula also mentioned), etc. and we could eliminate
> the performance regression compared to local RW through techniques like
> adding a local L2 cache. More information please refer to our talk
> at this year's Flink Forward China, and we could discuss more in another
> thread if interested.
> Back to @Naveen's question here, we need to make HBase supporting embedded
> mode first before adopting it as Flink statebackend. We have done some
> initial work and please refer to HBASE-17743
> <https://issues.apache.org/jira/browse/HBASE-17743> and the design doc
> there for more details. And for sure we will upstream our work when ready
> to (smile).
> Best Regards,
> On Fri, 28 Dec 2018 at 13:12, Chen Qin <qinnchen@xxxxxxxxx> wrote:
>> Hi Naveen,
>> AFAIK, there are two level of storage in typical statebackend
>> (local/remote). I think it kinda similar to what PC main memory and disk
>> Take RocksDB Statebackend as example, window state (typical very large
>> ListState) persisted in partitioned local rocksdb files, adding element to
>> window is localized and cheap.When checkpoint starts, each of those rocksdb
>> do upload to corresponding HDFS directories separately.This is good in a
>> sense when any intermediate states between two successful checkpoints can
>> be overwritten and local snapshots can be done cheaply and asynchronously.
>> I heard folks tried to build mysqlbackend(deprecated), remote rocksdb as
>> service backend(hard to scale and performance bottleneck) , Cassandra(hard
>> to snapshot). All of which shares same trait on lack of local
>> parallelizable snapshot semantic.
>> Hope this helps!
>> On Thu, Dec 27, 2018 at 8:27 AM miki haiat <miko5054@xxxxxxxxx> wrote:
>>> Did try to use rocksdb as state backend?
>>> On Thu, 27 Dec 2018, 18:17 Naveen Kumar <naveenkumar.g@xxxxxxxxxxxx
>>>> I am exploring if we can plugin hbase as state backend in Flink. We
>>>> need for streaming jobs with large window states, high throughput and
>>>> I wanted to know if implementing Flink backend in Hbase or other
>>>> distributed KV store is possible. Any documentation or pointers will be