osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Best practice for exhaustive planning


Thanks Michael,

I don't think that applies in our case - we aren't doing a table scan and
having Calcite implement the rest, but are translating the whole plan to a
Beam pipeline to run on e.g. Flink, Spark, Dataflow.

Here's an example:

    SELECT * FROM UNNEST (ARRAY ['a', 'b', 'c'])

With logical plan:

    LogicalProject(EXPR$0=[$0])
      Uncollect
        LogicalProject(EXPR$0=[ARRAY('a', 'b', 'c')])
          LogicalValues(tuples=[[{ 0 }]])

And the planner dumps "could not be implemented" when going for Beam's
calling convention. So I implement a rel & a rule.

Then there's the corellated version exploding an array field from a table:

    SELECT f_int, arrElems.f_string FROM main CROSS JOIN UNNEST
(main.f_stringArr) AS arrElems(f_string)

With logical plan:

    LogicalProject(f_int=[$0], f_string=[$2])
      LogicalCorrelate(correlation=[$cor0], joinType=[inner],
requiredColumns=[{1}])
        BeamIOSourceRel(table=[[beam, main]])
        Uncollect
          LogicalProject(f_stringArr=[$cor0.f_stringArr_1])
            LogicalValues(tuples=[[{ 0 }]])

I hacked something together to support this, too. I did not fully implement
Correlate; I would love to reject unsupported things in a meaningful way. I
would like to have confidence that there are not other permutations of
logical plans that we missed. For example for joins we match all joins and
translate them, then throw an error at a later stage.

Incidentally, when I ran the decorrelation [1] it appeared to have no
effect. We probably want to implement it directly in Beam anyhow in this
case.

Kenn

[1]
https://calcite.apache.org/apidocs/org/apache/calcite/sql2rel/SqlToRelConverter.html#decorrelate-org.apache.calcite.sql.SqlNode-org.apache.calcite.rel.RelNode-

On Tue, May 22, 2018 at 6:39 PM Michael Mior <mmior@xxxxxxxxxxxx> wrote:

> For most queries, the only thing you should need to implement is a scan and
> the rest can usually be implemented by Calcite. It would be good if you
> have a specific example of a query that fails.
>
> --
> Michael Mior
> mmior@xxxxxxxxxxxx
>
>
> Le mar. 22 mai 2018 à 19:01, Kenneth Knowles <klk@xxxxxxxxxx.invalid> a
> écrit :
>
> > Bumping this, as it ended up in spam for some people.
> >
> > Kenn
> >
> > On Tue, May 15, 2018 at 9:26 AM Kenneth Knowles <klk@xxxxxxxxxx> wrote:
> >
> > > Hi all,
> > >
> > > Beam SQL uses Calcite for parsing and (naive) planning. Currently it is
> > > pretty easy to write a SQL query that parses and causes a "could not
> > plan"
> > > dump when we ask the planner to convert things to the Beam calling
> > > convention. One current example is using UNNEST on a column to yield a
> > > LogicalCorrelate + Uncollect.
> > >
> > > There may obviously always be bits we don't support, but we'd like to
> > > ensure that the user encounters a careful error message rather than a
> > > planner dump. Is there a best practice for ensuring that we have
> covered
> > > all the cases? Is it just "everything name Logical*" or is there
> > something
> > > more clever?
> > >
> > > And if this question demonstrates that we are using Calcite entirely
> > > wrong, let us know :-)
> > >
> > > Kenn
> > >
> >
>