Re: Question about sketches aggregation in druid
Yes, SynchronizedUnion was added on seeing concurrency error on the
processes doing realtime ingestion. On other processes, it is not called
concurrently and hope is that synchronization does not cause overhead when
that piece of code is only called in one thread because it gets optimized.
On Tue, Jul 10, 2018 at 10:13 AM, Gian Merlino <gian@xxxxxxxxxx> wrote:
> Hi Eshcar,
> To my knowledge, in the Druid Aggregator and BufferAggregator interfaces,
> the main place where concurrency happens is that "aggregate" and "get" may
> be called simultaneously during realtime ingestion. So if there would be a
> benefit from improving concurrency it would probably end up in that area.
> On Tue, Jul 10, 2018 at 2:10 AM Eshcar Hillel <firstname.lastname@example.org>
> > Hi All,
> > My name is Eshcar Hillel from Oath research. I'm currently working with
> > Lee Rhodes on committing a new concurrent implementation of the theta
> > sketch to the sketches-core library.I was wondering whether this
> > implementation can help boost the union operation that is applied to
> > multiple sketches at query time in druid.From what I see in the code the
> > sketch aggregator uses the SynchronizedUnion implementation, which
> > basically uses a lock at every single access (update/read) of the union
> > operation. We believe a thread-safe implementation of the union operation
> > can help decrease the inherent overhead of the lock.
> > I will be happy to join the meeting today and briefly discuss this
> > Thanks,Eshcar