osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Best practice for exhaustive planning


Unfortunately, I'm not sure of the best way how to proceed from here, but
it seems like you're making progress :)
--
Michael Mior
mmior@xxxxxxxxxx



Le mar. 29 mai 2018 à 18:29, Kenneth Knowles <klk@xxxxxxxxxx.invalid> a
écrit :

> Thanks Michael,
>
> I don't think that applies in our case - we aren't doing a table scan and
> having Calcite implement the rest, but are translating the whole plan to a
> Beam pipeline to run on e.g. Flink, Spark, Dataflow.
>
> Here's an example:
>
>     SELECT * FROM UNNEST (ARRAY ['a', 'b', 'c'])
>
> With logical plan:
>
>     LogicalProject(EXPR$0=[$0])
>       Uncollect
>         LogicalProject(EXPR$0=[ARRAY('a', 'b', 'c')])
>           LogicalValues(tuples=[[{ 0 }]])
>
> And the planner dumps "could not be implemented" when going for Beam's
> calling convention. So I implement a rel & a rule.
>
> Then there's the corellated version exploding an array field from a table:
>
>     SELECT f_int, arrElems.f_string FROM main CROSS JOIN UNNEST
> (main.f_stringArr) AS arrElems(f_string)
>
> With logical plan:
>
>     LogicalProject(f_int=[$0], f_string=[$2])
>       LogicalCorrelate(correlation=[$cor0], joinType=[inner],
> requiredColumns=[{1}])
>         BeamIOSourceRel(table=[[beam, main]])
>         Uncollect
>           LogicalProject(f_stringArr=[$cor0.f_stringArr_1])
>             LogicalValues(tuples=[[{ 0 }]])
>
> I hacked something together to support this, too. I did not fully implement
> Correlate; I would love to reject unsupported things in a meaningful way. I
> would like to have confidence that there are not other permutations of
> logical plans that we missed. For example for joins we match all joins and
> translate them, then throw an error at a later stage.
>
> Incidentally, when I ran the decorrelation [1] it appeared to have no
> effect. We probably want to implement it directly in Beam anyhow in this
> case.
>
> Kenn
>
> [1]
>
> https://calcite.apache.org/apidocs/org/apache/calcite/sql2rel/SqlToRelConverter.html#decorrelate-org.apache.calcite.sql.SqlNode-org.apache.calcite.rel.RelNode-
>
> On Tue, May 22, 2018 at 6:39 PM Michael Mior <mmior@xxxxxxxxxxxx> wrote:
>
> > For most queries, the only thing you should need to implement is a scan
> and
> > the rest can usually be implemented by Calcite. It would be good if you
> > have a specific example of a query that fails.
> >
> > --
> > Michael Mior
> > mmior@xxxxxxxxxxxx
> >
> >
> > Le mar. 22 mai 2018 à 19:01, Kenneth Knowles <klk@xxxxxxxxxx.invalid> a
> > écrit :
> >
> > > Bumping this, as it ended up in spam for some people.
> > >
> > > Kenn
> > >
> > > On Tue, May 15, 2018 at 9:26 AM Kenneth Knowles <klk@xxxxxxxxxx>
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Beam SQL uses Calcite for parsing and (naive) planning. Currently it
> is
> > > > pretty easy to write a SQL query that parses and causes a "could not
> > > plan"
> > > > dump when we ask the planner to convert things to the Beam calling
> > > > convention. One current example is using UNNEST on a column to yield
> a
> > > > LogicalCorrelate + Uncollect.
> > > >
> > > > There may obviously always be bits we don't support, but we'd like to
> > > > ensure that the user encounters a careful error message rather than a
> > > > planner dump. Is there a best practice for ensuring that we have
> > covered
> > > > all the cases? Is it just "everything name Logical*" or is there
> > > something
> > > > more clever?
> > > >
> > > > And if this question demonstrates that we are using Calcite entirely
> > > > wrong, let us know :-)
> > > >
> > > > Kenn
> > > >
> > >
> >
>