[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Two questions about HLL


Not with the HLL aggregator. The Theta Sketch
<http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html>
aggregator has some support for such things though. The examples even give
one use case for intersection operations.

On Mon, Jul 30, 2018 at 5:46 PM ? ? <BioLearning@xxxxxxxxxxx> wrote:

> Thanks Charles for your reply, it helped me better understand “finalize”
> and "post-aggregate".
>
>
> Back to my question, I understand HyperUnique type stores the unique
> entities even though it is encoded by BASE64, taking a specific example,
> suppose I have one HyperUnique which represents {‘a’, 'b', 'c'} and another
> one for {'b', 'c', 'd'}, I would like to have their intersection, i.e.,
> logical AND operation between them, is it supported by Druid?
>
>
>
>
> Thx
>
> Lei Wang
>
> ________________________________
> From: Charles Allen <charles.allen@xxxxxxxx.INVALID>
> Sent: Monday, July 30, 2018 22:47
> To: dev@xxxxxxxxxxxxxxxx
> Subject: Re: Two questions about HLL
>
> Thank you for bringing this up.
>
> The query process has multiple stages. The final stage is a "finalize"
> stage. During this process the query tries to get the binary form (ex:
> long, float, stats, hyperunique) into something that can be sent over the
> wire as json for consumption by the end user in a meaningful way. As such,
> you should be able to use a HyperUnique aggregation just fine as its
> finalization should yield a human readable number.
>
> The reason the HyperUniqueCardinality aggregator exists is because the post
> aggregator computations occur before the finalization. So at that point the
> HyperUnique is still a HyperUnique aggregator and not a finalized number.
> Trying to use the HyperUnique aggregator in a post-agg is hard to make
> dynamically work for all post aggs and still yield expected behavior.
> Explicitly declaring a HyperUniqueCardinality post aggregator makes it very
> clear how you want the results of the HLL calculation handled for the
> purposes of post aggregation.
>
> Long story short, you should be able to use HyperUnique if you want the
> sketch estimate directly in the query result body.
>
> Cheers,
> Charles Allen
>
>
>
> On Sun, Jul 29, 2018 at 11:15 PM ? ? <BioLearning@xxxxxxxxxxx> wrote:
>
> > Hi,
> >
> > I am newbee of Druid, and I would like to aggregate hyperUnique of daily
> > users to get the distinct count in a period of days, around this -
> >
> >
> >   1.  I am surprised by I did not even find the druid page about all of
> > column types supported by Druid, so far what I met are - String,
> > HyperUnique, LongSum, time etc;
> >   2.  Is there such post aggregation function to aggregate hyperUnique
> > further to count the distinct values on the top of that? I did not find
> > that it seems there is only one HyperUniqueCadinality for the count
> number
> > from HyperUnique which can be used to arithmetical calculation only.
> >
> >
> > Thanks advance for your clarification.
> >
> >
>