osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Contribution of IN operator handling


Yes, the other adapters need to write their own converters, specific to their target language. DB languages tend to be divergent in how they represent “column in (constant1, …, constantN)”[1][2], so this makes sense. Of course they can borrow/share code with the JDBC adapter.

Julian

[1] https://docs.mongodb.com/manual/tutorial/query-documents/#specify-conditions-using-query-operators <https://docs.mongodb.com/manual/tutorial/query-documents/#specify-conditions-using-query-operators>

[2] http://druid.io/docs/latest/querying/filters.html#in-filter <http://druid.io/docs/latest/querying/filters.html#in-filter>



> On Nov 20, 2018, at 12:14 PM, Andrei Sereda <andrei@xxxxxxxxx> wrote:
> 
> Hi Julian,
> 
> Thanks for your reply. I have a question
> 
> … where t is a table based on a JDBC data source …
> Do we agree on that goal?
> 
> Does that mean that each of adapters (Elastic, Geode, Mongo etc.) have to
> implement their own version of converters (similar to RexToSqlConverter) ?
> If it is a flat RexCall of ORs this should be doable, I guess.
> 
> On Tue, Nov 20, 2018 at 2:14 PM Julian Hyde <jhyde@xxxxxxxxxx <mailto:jhyde@xxxxxxxxxx>> wrote:
> 
>> Thanks Andrei, that’s the discussion I was thinking of.
>> 
>> In Mykola’s case, I think it would be useful to solve the end-to-end
>> problem. Given a Calcite query “select … from t where x in (c1, c2, …,
>> cN)”, where t is a table based on a JDBC data source, ci are constants and
>> N is large, we want Calcite’s JDBC adapter to send a query similar to
>> “select … from t where x in (c1, c2, …, cN)” to the JDBC source.
>> 
>> Do we agree on that goal?
>> 
>> If we agree on the goal, the next question is how that query should be
>> represented in the RelNode/RexNode intermediate representation. The choice
>> of that representation has implications: performance (e.g. whether we hit a
>> stack-overflow exception or quadratic algorithm), quality (whether we are
>> forging a new code path that is untested), surface area (are we going to
>> need to write a lot of new code, for example new planner rules, in order to
>> achieve parity with existing cases).
>> 
>> The approach I advocate in
>> https://issues.apache.org/jira/browse/CALCITE-2630 <
>> https://issues.apache.org/jira/browse/CALCITE-2630 <https://issues.apache.org/jira/browse/CALCITE-2630>> - representing the IN
>> clause as a large, flat OR RexCall “x = c1 or x = c2 … or x = cN”, and
>> having RexToSqlConverter translate that OR into an IN SqlNode - meets those
>> criteria. (We may need to fix some bugs relating to quadratic performance
>> or stack depth, but those are worth doing anyway.)
>> 
>> Are there other approaches that meet the same criteria? The original
>> proposal - adding IN as a Rex operator - is a significant increase in
>> surface area, so we would either lose functionality (e.g. not be able to
>> push filters into the IN list) or find ourselves having to write a lot of
>> new code and have to fix a lot of new bugs.
>> 
>> Julian
>> 
>> 
>>> On Nov 20, 2018, at 10:06 AM, Andrei Sereda <andrei@xxxxxxxxx> wrote:
>>> 
>>> Convert SqlInOperator to In-Expression :
>>> https://issues.apache.org/jira/browse/CALCITE-2630
>>> 
>>> Related. full table scans and subQueryThreshold.
>>> 
>> https://lists.apache.org/thread.html/1a25c956262633f8ef0d224ed76400761f6797c494a21796579eb4f2@%3Cdev.calcite.apache.org%3E
>>> 
>>> 
>>> 
>>> On Tue, Nov 20, 2018 at 12:08 PM Julian Hyde <jhyde@xxxxxxxxxx> wrote:
>>> 
>>>> I recall contributing to some other conversations about large IN lists
>>>> over the past few months. Before we jump into a discussion, can you
>> locate
>>>> those threads? Also, if there is not a JIRA case, can you please create
>> one?
>>>> 
>>>> Julian
>>>> 
>>>>> On Nov 20, 2018, at 8:23 AM, Mykola Zerniuk <mykola.zerniuk@xxxxxxxx
>> .INVALID>
>>>> wrote:
>>>>> 
>>>>> Dear Calcite Administrators,
>>>>> 
>>>>> my name is Mykola, software engineer from Ukraine.
>>>>> 
>>>>> I had an issue with Calcite IN operator handling.
>>>>> 
>>>>> Here is my previous email to you:
>>>>> 
>>>> 
>> https://mail-archives.apache.org/mod_mbox/calcite-dev/201810.mbox/%3CCAL4PLbiBh1HoP0w_5ScJ1Nnxq%2BNYGP2LO2usxg_17Gs1mYgttA%40mail.gmail.com%3E
>>>>> 
>>>>> It is really important to us to have an option to left IN operator "as
>>>>> is" and do not do any conversions. I implemented it a while ago at my
>>>>> local, and it successfully works in our project.
>>>>> 
>>>>> Our team would be happy to have your review and contribute it to
>> Calcite.
>>>>> 
>>>>> If you have no objections may i create a work item in Jira? I am
>>>>> following these steps:
>>>>> https://calcite.apache.org/develop/#contributing
>>>>> 
>>>>> Thanks a lot,
>>>>> Mykola