osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Review Request (CALCITE-2689). ElasticSearch Adapter. Grouping on date / number fields.


Thanks, Stamatis.

I agree that sentinels are not the best approach but without pushing
aggregations to ES they're not very usable to elastic users.

So there are two bad choices:
1) (Low?) Probability of collisions triggering invalid results (surprises)
but efficient query.
2)  Inefficient query (fetching whole index in memory)

I hope we can drop support for ES 2.x soon and enforce ES >= 6.1. If so,
one can use  Composite aggregations
<https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html>
which don't have these shortcomings.
What we can do now is have separate aggregation query for ES2 and ES6.

What do you think?

Regards,
Andrei.



On Thu, Dec 13, 2018 at 6:16 AM Stamatis Zampetakis <zabetak@xxxxxxxxx>
wrote:

> Hi Andrei,
>
> I haven't gone entirely over the new PR, but I think there are cases where
> the result of the queries are going to be wrong (when values collide with
> sentinels).
> Another approach would be to introduce a rule in order to push the
> aggregation in ElasticSearch *only* if the field in questions is *not*
> nullable. This can make
> queries a bit more verbose but it guarantees that there will be no suprises
> to the end-user.
>
> select max(amount), date from orders group by amount, date -> Aggregation
> not pushed on ES
> select max(amount), date from orders where amount is not null and date is
> not null group by amount, date -> Aggregation pushed on ES
>
> Sorry for the late reply!
>
> Best,
> Stamatis
>
>
> Στις Κυρ, 2 Δεκ 2018 στις 2:01 π.μ., ο/η Julian Hyde <jhyde@xxxxxxxxxx>
> έγραψε:
>
> > Interesting.
> >
> > One of the benefits that a SQL layer such as Calcite can bring is that it
> > hides the details necessary to make operations like this work.
> >
> > Julian
> >
> >
> > > On Dec 1, 2018, at 5:22 AM, Kevin Risden <krisden@xxxxxxxxxx> wrote:
> > >
> > > I haven't had a chance to review but saw that Elastic has the same
> issue
> > > with aggregations.
> > >
> > > https://github.com/elastic/elasticsearch/issues/35745
> > >
> > > Kevin Risden
> > >
> > > On Wed, Nov 28, 2018, 20:46 Andrei Sereda <andrei@xxxxxxxxx wrote:
> > >
> > >> Greetings ,
> > >> We have discovered an issue with ES aggregations when grouping on
> > >> non-textual fields (date, long). Currently the following sql fails
> > because
> > >> for missing value
> > >> <
> > >>
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_missing_value_13
> > >>>
> > >> we inject __MISSING__ sentinel which is not date / number parseable
> (it
> > >> can’t be null either) :
> > >>
> > >> select max(amount), date from orders group by date -- special ES type
> > >>
> > >> The solution is to make sentinel type specific :
> > >>
> > >>   1. Integer.MIN_VALUE for integers
> > >>   2. 9999-12-31 for dates etc.
> > >>
> > >> For low cardinality types like boolean, byte, short I’m not sure what
> > to do
> > >> since there is high probability of collision between missing field and
> > >> actual value (eg. what value to choose for missing boolean?) :
> > >>
> > >> select max(amount), isActive from orders group by isActive -- boolean
> > type
> > >>
> > >> Let me know if you solved this problem differently before. Composite
> > >> aggregations
> > >> <
> > >>
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
> > >>>
> > >> (available since 6.1) should help in future.
> > >>
> > >> PR: https://github.com/apache/calcite/pull/946
> > >> JIRA: https://issues.apache.org/jira/browse/CALCITE-2689
> > >>
> > >> Many Thanks in Advance,
> > >> Andrei.
> > >>
> >
> >
>