osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] [Calcite-2683] ProjectMergeRule should not be performed when Nondeterministic udf has been referenced more than once


Repeating the comments I made in the JIRA case [1].

I do find your argument compelling, that if the rewritten version contains the same number of calls to the UDF, it should be OK.

But there are other possible semantics. For instance, a “strict” semantic could allow rewrite only if the calls to the UDF are guaranteed to be the same number, and the same order. A “relaxed” semantic would allow non-deterministic functions (and dynamic functions, see [2]) to be rewritten any time.

Perhaps there could be variants of this rule, one for each semantic, and the semantics could be chosen via a connection- or statement-level property. To enforce a particular semantic, several rules will need to modify their behavior (e.g. FilterProjectTransposeRule), so those rules would be parameterized on semantic also.

Julian

[2] https://issues.apache.org/jira/browse/CALCITE-2638 <https://issues.apache.org/jira/browse/CALCITE-2638> 

> On Nov 19, 2018, at 7:51 AM, Hequn Cheng <chenghequn@xxxxxxxxx> wrote:
> 
> Hi,
> 
> Currently, there are some merge rules for Project, such as CalcMergeRule,
> ProjectMergeRule, and ProjectCalcMergeRule. I found that these merge rules
> should not be performed when Nondeterministic expression of the
> bottom(inner) project has been referenced more than once by the top(outer)
> project. Take the following test as an example:
> 
>  @Test public void testProjectMergeCalcMergeWithNonDeterministic() throws
> Exception {
>    HepProgram program = new HepProgramBuilder()
>            .addRuleInstance(FilterProjectTransposeRule.INSTANCE)
>            .addRuleInstance(ProjectMergeRule.INSTANCE)
>            .build();
> 
>    checkPlanning(program,
>            "select name, a as a1, a as a2 from (\n"
>                    + "  select *, rand() as a\n"
>                    + "  from dept)\n"
>                    + "where deptno = 10\n");
>  }
> 
> The first select generates `a` from `rand()` and the second select generate
> `a1` and `a2` from `a`. From the SQL, `a1` should equal to `a2`.
> Let's take a look at the result plan:
> 
> LogicalProject(NAME=[$1], A1=[RAND()], A2=[RAND()])
>  LogicalFilter(condition=[=($0, 10)])
>    LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> 
> In the plan, a1 may not equal to a2 due to the projects merge which is
> against the SQL(a1 equals to a2).
> In order to let a1 equal to a2, one option to solve the problem is to
> disable these merge rules in such cases, so that the result plan will be:
> 
> LogicalProject(NAME=[$1], A1=[$2], A2=[$2])
>  LogicalProject(DEPTNO=[$0], NAME=[$1], A=[RAND()])
>    LogicalFilter(condition=[=($0, 10)])
>      LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> 
> Do you guys have any good ideas or encountered similar problems? Any
> suggestions are greatly appreciated.
> 
> Best,
> Hequn
> 
> [1] jira link: https://issues.apache.org/jira/browse/CALCITE-2683