[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Hbase state backend in Flink

Hi Yu,

Very cool! I might be out of dated of what’s new in Flink already… 
Just wonder If there are efforts to support seconds level barrier alignment?


> On Dec 27, 2018, at 23:26, Yu Li <carp84@xxxxxxxxx> wrote:
> FWIW, one major advantage of adopting HBase as Flink statebackend is to
> support direct read/write on DFS, so as to disaggregate storage and compute
> (DisAgg).  DisAgg has several benefits, such as supporting elastic
> computing in cloud, much better (order of magnitude) recovery speed when
> rescaling up/down (as Gyula also mentioned), etc. and we could eliminate
> the performance regression compared to local RW through techniques like
> adding a local L2 cache. More information please refer to our talk
> <https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf>
> at this year's Flink Forward China, and we could discuss more in another
> thread if interested.
> Back to @Naveen's question here, we need to make HBase supporting embedded
> mode first before adopting it as Flink statebackend. We have done some
> initial work and please refer to HBASE-17743
> <https://issues.apache.org/jira/browse/HBASE-17743> and the design doc
> there for more details. And for sure we will upstream our work when ready
> to (smile).
> Best Regards,
> Yu
> On Fri, 28 Dec 2018 at 13:12, Chen Qin <qinnchen@xxxxxxxxx> wrote:
>> Hi Naveen,
>> AFAIK, there are two level of storage in typical statebackend
>> (local/remote). I think it kinda similar to what PC main memory and disk
>> analogy.
>> Take RocksDB Statebackend as example, window state (typical very large
>> ListState) persisted in partitioned local rocksdb files, adding element to
>> window is localized and cheap.When checkpoint starts, each of those rocksdb
>> do upload to corresponding HDFS directories separately.This is good in a
>> sense when any intermediate states between two successful checkpoints can
>> be overwritten and local snapshots can be done cheaply and asynchronously.
>> I heard folks tried to build mysqlbackend(deprecated), remote rocksdb as
>> service backend(hard to scale and performance bottleneck) , Cassandra(hard
>> to snapshot). All of which shares same trait on lack of local
>> parallelizable snapshot semantic.
>> Hope this helps!
>> Chen
>> On Thu, Dec 27, 2018 at 8:27 AM miki haiat <miko5054@xxxxxxxxx> wrote:
>>> Did try to use rocksdb[1] as state backend?
>>> 1.
>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#the-rocksdbstatebackend
>>> On Thu, 27 Dec 2018, 18:17 Naveen Kumar <naveenkumar.g@xxxxxxxxxxxx
>>> .invalid
>>> wrote:
>>>> Hi,
>>>> I am exploring if we can plugin hbase as state backend in Flink. We
>> have
>>>> need for streaming jobs with large window states, high throughput and
>>>> reliability.
>>>> I wanted to know if implementing Flink backend in Hbase or other
>>>> distributed KV store is possible. Any documentation or pointers will be
>>>> helpful.
>>>> Thanks,
>>>> Naveen