[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Two questions about HLL

Thanks Charles for your reply, it helped me better understand “finalize” and "post-aggregate".

Back to my question, I understand HyperUnique type stores the unique entities even though it is encoded by BASE64, taking a specific example, suppose I have one HyperUnique which represents {‘a’, 'b', 'c'} and another one for {'b', 'c', 'd'}, I would like to have their intersection, i.e., logical AND operation between them, is it supported by Druid?


Lei Wang

From: Charles Allen <charles.allen@xxxxxxxx.INVALID>
Sent: Monday, July 30, 2018 22:47
To: dev@xxxxxxxxxxxxxxxx
Subject: Re: Two questions about HLL

Thank you for bringing this up.

The query process has multiple stages. The final stage is a "finalize"
stage. During this process the query tries to get the binary form (ex:
long, float, stats, hyperunique) into something that can be sent over the
wire as json for consumption by the end user in a meaningful way. As such,
you should be able to use a HyperUnique aggregation just fine as its
finalization should yield a human readable number.

The reason the HyperUniqueCardinality aggregator exists is because the post
aggregator computations occur before the finalization. So at that point the
HyperUnique is still a HyperUnique aggregator and not a finalized number.
Trying to use the HyperUnique aggregator in a post-agg is hard to make
dynamically work for all post aggs and still yield expected behavior.
Explicitly declaring a HyperUniqueCardinality post aggregator makes it very
clear how you want the results of the HLL calculation handled for the
purposes of post aggregation.

Long story short, you should be able to use HyperUnique if you want the
sketch estimate directly in the query result body.

Charles Allen

On Sun, Jul 29, 2018 at 11:15 PM ? ? <BioLearning@xxxxxxxxxxx> wrote:

> Hi,
> I am newbee of Druid, and I would like to aggregate hyperUnique of daily
> users to get the distinct count in a period of days, around this -
>   1.  I am surprised by I did not even find the druid page about all of
> column types supported by Druid, so far what I met are - String,
> HyperUnique, LongSum, time etc;
>   2.  Is there such post aggregation function to aggregate hyperUnique
> further to count the distinct values on the top of that? I did not find
> that it seems there is only one HyperUniqueCadinality for the count number
> from HyperUnique which can be used to arithmetical calculation only.
> Thanks advance for your clarification.