osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] [Calcite-2683] ProjectMergeRule should not be performed when Nondeterministic udf has been referenced more than once


Hi Hequn,
Thanks for report this issue. I think the premise of rule optimization is
to guarantee the correctness of the semantics. When the rule is matched,
the isDeterministics attribute of UDF must be considered.

+1 to be fixed.

Thanks,
Jincheng

Hequn Cheng <chenghequn@xxxxxxxxx> 于2018年11月19日周一 下午11:52写道:

> Hi,
>
> Currently, there are some merge rules for Project, such as CalcMergeRule,
> ProjectMergeRule, and ProjectCalcMergeRule. I found that these merge rules
> should not be performed when Nondeterministic expression of the
> bottom(inner) project has been referenced more than once by the top(outer)
> project. Take the following test as an example:
>
>   @Test public void testProjectMergeCalcMergeWithNonDeterministic() throws
> Exception {
>     HepProgram program = new HepProgramBuilder()
>             .addRuleInstance(FilterProjectTransposeRule.INSTANCE)
>             .addRuleInstance(ProjectMergeRule.INSTANCE)
>             .build();
>
>     checkPlanning(program,
>             "select name, a as a1, a as a2 from (\n"
>                     + "  select *, rand() as a\n"
>                     + "  from dept)\n"
>                     + "where deptno = 10\n");
>   }
>
> The first select generates `a` from `rand()` and the second select generate
> `a1` and `a2` from `a`. From the SQL, `a1` should equal to `a2`.
> Let's take a look at the result plan:
>
> LogicalProject(NAME=[$1], A1=[RAND()], A2=[RAND()])
>   LogicalFilter(condition=[=($0, 10)])
>     LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>
> In the plan, a1 may not equal to a2 due to the projects merge which is
> against the SQL(a1 equals to a2).
> In order to let a1 equal to a2, one option to solve the problem is to
> disable these merge rules in such cases, so that the result plan will be:
>
> LogicalProject(NAME=[$1], A1=[$2], A2=[$2])
>   LogicalProject(DEPTNO=[$0], NAME=[$1], A=[RAND()])
>     LogicalFilter(condition=[=($0, 10)])
>       LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>
> Do you guys have any good ideas or encountered similar problems? Any
> suggestions are greatly appreciated.
>
> Best,
> Hequn
>
> [1] jira link: https://issues.apache.org/jira/browse/CALCITE-2683
>