osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Contribution of IN operator handling


Hi Julian,

Thanks for your reply. I have a question

… where t is a table based on a JDBC data source …
Do we agree on that goal?

Does that mean that each of adapters (Elastic, Geode, Mongo etc.) have to
implement their own version of converters (similar to RexToSqlConverter) ?
If it is a flat RexCall of ORs this should be doable, I guess.

On Tue, Nov 20, 2018 at 2:14 PM Julian Hyde <jhyde@xxxxxxxxxx> wrote:

> Thanks Andrei, that’s the discussion I was thinking of.
>
> In Mykola’s case, I think it would be useful to solve the end-to-end
> problem. Given a Calcite query “select … from t where x in (c1, c2, …,
> cN)”, where t is a table based on a JDBC data source, ci are constants and
> N is large, we want Calcite’s JDBC adapter to send a query similar to
> “select … from t where x in (c1, c2, …, cN)” to the JDBC source.
>
> Do we agree on that goal?
>
> If we agree on the goal, the next question is how that query should be
> represented in the RelNode/RexNode intermediate representation. The choice
> of that representation has implications: performance (e.g. whether we hit a
> stack-overflow exception or quadratic algorithm), quality (whether we are
> forging a new code path that is untested), surface area (are we going to
> need to write a lot of new code, for example new planner rules, in order to
> achieve parity with existing cases).
>
> The approach I advocate in
> https://issues.apache.org/jira/browse/CALCITE-2630 <
> https://issues.apache.org/jira/browse/CALCITE-2630> - representing the IN
> clause as a large, flat OR RexCall “x = c1 or x = c2 … or x = cN”, and
> having RexToSqlConverter translate that OR into an IN SqlNode - meets those
> criteria. (We may need to fix some bugs relating to quadratic performance
> or stack depth, but those are worth doing anyway.)
>
> Are there other approaches that meet the same criteria? The original
> proposal - adding IN as a Rex operator - is a significant increase in
> surface area, so we would either lose functionality (e.g. not be able to
> push filters into the IN list) or find ourselves having to write a lot of
> new code and have to fix a lot of new bugs.
>
> Julian
>
>
> > On Nov 20, 2018, at 10:06 AM, Andrei Sereda <andrei@xxxxxxxxx> wrote:
> >
> > Convert SqlInOperator to In-Expression :
> > https://issues.apache.org/jira/browse/CALCITE-2630
> >
> > Related. full table scans and subQueryThreshold.
> >
> https://lists.apache.org/thread.html/1a25c956262633f8ef0d224ed76400761f6797c494a21796579eb4f2@%3Cdev.calcite.apache.org%3E
> >
> >
> >
> > On Tue, Nov 20, 2018 at 12:08 PM Julian Hyde <jhyde@xxxxxxxxxx> wrote:
> >
> >> I recall contributing to some other conversations about large IN lists
> >> over the past few months. Before we jump into a discussion, can you
> locate
> >> those threads? Also, if there is not a JIRA case, can you please create
> one?
> >>
> >> Julian
> >>
> >>> On Nov 20, 2018, at 8:23 AM, Mykola Zerniuk <mykola.zerniuk@xxxxxxxx
> .INVALID>
> >> wrote:
> >>>
> >>> Dear Calcite Administrators,
> >>>
> >>> my name is Mykola, software engineer from Ukraine.
> >>>
> >>> I had an issue with Calcite IN operator handling.
> >>>
> >>> Here is my previous email to you:
> >>>
> >>
> https://mail-archives.apache.org/mod_mbox/calcite-dev/201810.mbox/%3CCAL4PLbiBh1HoP0w_5ScJ1Nnxq%2BNYGP2LO2usxg_17Gs1mYgttA%40mail.gmail.com%3E
> >>>
> >>> It is really important to us to have an option to left IN operator "as
> >>> is" and do not do any conversions. I implemented it a while ago at my
> >>> local, and it successfully works in our project.
> >>>
> >>> Our team would be happy to have your review and contribute it to
> Calcite.
> >>>
> >>> If you have no objections may i create a work item in Jira? I am
> >>> following these steps:
> >>> https://calcite.apache.org/develop/#contributing
> >>>
> >>> Thanks a lot,
> >>> Mykola
> >>
> >>
>
>