osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Contribution of IN operator handling


Thanks Andrei, that’s the discussion I was thinking of.

In Mykola’s case, I think it would be useful to solve the end-to-end problem. Given a Calcite query “select … from t where x in (c1, c2, …, cN)”, where t is a table based on a JDBC data source, ci are constants and N is large, we want Calcite’s JDBC adapter to send a query similar to “select … from t where x in (c1, c2, …, cN)” to the JDBC source.

Do we agree on that goal?

If we agree on the goal, the next question is how that query should be represented in the RelNode/RexNode intermediate representation. The choice of that representation has implications: performance (e.g. whether we hit a stack-overflow exception or quadratic algorithm), quality (whether we are forging a new code path that is untested), surface area (are we going to need to write a lot of new code, for example new planner rules, in order to achieve parity with existing cases).

The approach I advocate in https://issues.apache.org/jira/browse/CALCITE-2630 <https://issues.apache.org/jira/browse/CALCITE-2630> - representing the IN clause as a large, flat OR RexCall “x = c1 or x = c2 … or x = cN”, and having RexToSqlConverter translate that OR into an IN SqlNode - meets those criteria. (We may need to fix some bugs relating to quadratic performance or stack depth, but those are worth doing anyway.)

Are there other approaches that meet the same criteria? The original proposal - adding IN as a Rex operator - is a significant increase in surface area, so we would either lose functionality (e.g. not be able to push filters into the IN list) or find ourselves having to write a lot of new code and have to fix a lot of new bugs.

Julian


> On Nov 20, 2018, at 10:06 AM, Andrei Sereda <andrei@xxxxxxxxx> wrote:
> 
> Convert SqlInOperator to In-Expression :
> https://issues.apache.org/jira/browse/CALCITE-2630
> 
> Related. full table scans and subQueryThreshold.
> https://lists.apache.org/thread.html/1a25c956262633f8ef0d224ed76400761f6797c494a21796579eb4f2@%3Cdev.calcite.apache.org%3E
> 
> 
> 
> On Tue, Nov 20, 2018 at 12:08 PM Julian Hyde <jhyde@xxxxxxxxxx> wrote:
> 
>> I recall contributing to some other conversations about large IN lists
>> over the past few months. Before we jump into a discussion, can you locate
>> those threads? Also, if there is not a JIRA case, can you please create one?
>> 
>> Julian
>> 
>>> On Nov 20, 2018, at 8:23 AM, Mykola Zerniuk <mykola.zerniuk@xxxxxxxx.INVALID>
>> wrote:
>>> 
>>> Dear Calcite Administrators,
>>> 
>>> my name is Mykola, software engineer from Ukraine.
>>> 
>>> I had an issue with Calcite IN operator handling.
>>> 
>>> Here is my previous email to you:
>>> 
>> https://mail-archives.apache.org/mod_mbox/calcite-dev/201810.mbox/%3CCAL4PLbiBh1HoP0w_5ScJ1Nnxq%2BNYGP2LO2usxg_17Gs1mYgttA%40mail.gmail.com%3E
>>> 
>>> It is really important to us to have an option to left IN operator "as
>>> is" and do not do any conversions. I implemented it a while ago at my
>>> local, and it successfully works in our project.
>>> 
>>> Our team would be happy to have your review and contribute it to Calcite.
>>> 
>>> If you have no objections may i create a work item in Jira? I am
>>> following these steps:
>>> https://calcite.apache.org/develop/#contributing
>>> 
>>> Thanks a lot,
>>> Mykola
>> 
>>