OSDir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Support close of the iterator/iterable created from MapState/SetState


I don't agree. I believe you can track the iterators/iterables that are created and freed by using weak references and reference queues (or other methods). Having a few people work 10x as hard to provide a good implementation is much better then having 100s or 1000s of users suffering through a more complicated API.

On Thu, May 10, 2018 at 3:44 PM Xinyu Liu <xinyuliu.us@xxxxxxxxx> wrote:
Load/evict blocks will help reduce the cache memory footprint, but we still won't be able to release the underlying resources. We can add definitely heuristics to help release the resources as you mentioned, but there is no accurate way to track all the iterators/iterables created and free them up once not needed. I think while the API is aimed at nice user experience, we should have the option to let users optimize their performance if they choose to. Do you agree?

Thanks,
Xinyu

On Thu, May 10, 2018 at 3:25 PM, Lukasz Cwik <lcwik@xxxxxxxxxx> wrote:
Users won't reliably close/release the resources and forcing them to will make the user experience worse.
It will make a lot more sense to use a file format which allows random access and use a cache to load/evict blocks of the state from memory.
If that is not possible, use an iterable which frees the resource after a certain amount of inactivity or uses weak references.

On Thu, May 10, 2018 at 3:07 PM Xinyu Liu <xinyuliu.us@xxxxxxxxx> wrote:
Hi, folks,

I'm in the middle of implementing the MapState and SetState in our Samza runner. We noticed that the state returns the Java Iterable for reading entries, keys, etc. For state backed by file-based kv store like rocksDb, we need to be able to let users explicitly close iterator/iterable to release the resources.Otherwise we have to load the iterable into memory so we can safely close the underlying rocksDb iterator, similar to Flink's implementation. But this won't work for states that don't fit into memory. I chatted with Kenn and he also agrees we need this capability to avoid bulk read/write. This seems to be a general use case and I'm wondering if we can add the support to it? I am happy to contribute to this if needed. Any feedback is highly appreciated.

Thanks,
Xinyu