osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Review Request (CALCITE-2689). ElasticSearch Adapter. Grouping on date / number fields.


Hi Andrei,

I haven't gone entirely over the new PR, but I think there are cases where
the result of the queries are going to be wrong (when values collide with
sentinels).
Another approach would be to introduce a rule in order to push the
aggregation in ElasticSearch *only* if the field in questions is *not*
nullable. This can make
queries a bit more verbose but it guarantees that there will be no suprises
to the end-user.

select max(amount), date from orders group by amount, date -> Aggregation
not pushed on ES
select max(amount), date from orders where amount is not null and date is
not null group by amount, date -> Aggregation pushed on ES

Sorry for the late reply!

Best,
Stamatis


Στις Κυρ, 2 Δεκ 2018 στις 2:01 π.μ., ο/η Julian Hyde <jhyde@xxxxxxxxxx>
έγραψε:

> Interesting.
>
> One of the benefits that a SQL layer such as Calcite can bring is that it
> hides the details necessary to make operations like this work.
>
> Julian
>
>
> > On Dec 1, 2018, at 5:22 AM, Kevin Risden <krisden@xxxxxxxxxx> wrote:
> >
> > I haven't had a chance to review but saw that Elastic has the same issue
> > with aggregations.
> >
> > https://github.com/elastic/elasticsearch/issues/35745
> >
> > Kevin Risden
> >
> > On Wed, Nov 28, 2018, 20:46 Andrei Sereda <andrei@xxxxxxxxx wrote:
> >
> >> Greetings ,
> >> We have discovered an issue with ES aggregations when grouping on
> >> non-textual fields (date, long). Currently the following sql fails
> because
> >> for missing value
> >> <
> >>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_missing_value_13
> >>>
> >> we inject __MISSING__ sentinel which is not date / number parseable (it
> >> can’t be null either) :
> >>
> >> select max(amount), date from orders group by date -- special ES type
> >>
> >> The solution is to make sentinel type specific :
> >>
> >>   1. Integer.MIN_VALUE for integers
> >>   2. 9999-12-31 for dates etc.
> >>
> >> For low cardinality types like boolean, byte, short I’m not sure what
> to do
> >> since there is high probability of collision between missing field and
> >> actual value (eg. what value to choose for missing boolean?) :
> >>
> >> select max(amount), isActive from orders group by isActive -- boolean
> type
> >>
> >> Let me know if you solved this problem differently before. Composite
> >> aggregations
> >> <
> >>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
> >>>
> >> (available since 6.1) should help in future.
> >>
> >> PR: https://github.com/apache/calcite/pull/946
> >> JIRA: https://issues.apache.org/jira/browse/CALCITE-2689
> >>
> >> Many Thanks in Advance,
> >> Andrei.
> >>
>
>