osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help with EnumerableMergeJoinRule which is losing a RelCollection trait


Hi again,

thanks for the clarifications!

RelTraitSet#replace is meant to replace some traits *IF* there are some
already present.
If your planner does not have the RelCollationTraitDef activated replace
will have no effect since there are not going to be any traits of this
definition present.
In other words, I think your planner is missing the RelCollationTraitDef.

Best,
Stamatis

Στις Κυρ, 23 Σεπ 2018 στις 8:24 π.μ., ο/η Enrico Olivelli <
eolivelli@xxxxxxxxx> έγραψε:

> Il giorno dom 23 set 2018 alle ore 00:11 Stamatis Zampetakis
> <zabetak@xxxxxxxxx> ha scritto:
> >
> > Hi Enrico,
> >
> > I think a bit more context would help.
> >
> > What kind of org.apache.calcite.schema.Table are you using in your
> Schema?
>
> Hi Stamatis,
>
> This is my implementation
>
> https://github.com/diennea/herddb/blob/0c7c01584350d57d8102511b987e5f880f3f65bd/herddb-core/src/main/java/herddb/sql/CalcitePlanner.java#L1237
>
> essentially
> @Override
> public Statistic getStatistic() {
>    return Statistics.of(tableManager.getStats().getTablesize(),  keys);
> }
>
> where "keys" is the Primary Key.
>
> So I assume that this means that there is no intrinsic "collation" for the
> table
> in my undestanding these collations in Statistic define a natural
> order in the table, like in a table clustered by a index
> and the system may assume that an EnumerableTableScan is naturally
> sorted by that collation.
>
> Bonus question:
> can you give me some more explanation about the meaning of
> "RelTraitSet#replace" ?
> should it 'add' collations to the Rel or it is a conversion and so, it
> if fails the EnumerableMergeJoinRule should not fire ?
>
>
> Thank you very much
> Enrico
>
>
>
> > I suppose you already looked but just in case are there any kind of
> > statistics on these tables.
> > I've seen a similar case where the statistics of the table were declaring
> > that some columns were sorted without this being true.
> > I just want to make sure that this is not the case here.
> >
> > Best,
> > Stamatis
> >
> >
> > Στις Σάβ, 22 Σεπ 2018 στις 11:43 π.μ., ο/η Enrico Olivelli <
> > eolivelli@xxxxxxxxx> έγραψε:
> >
> > > Hi,
> > > We found a strange behaviour in an execution plan, basially we have an
> > > EnumerableMergeJoin which has as input two non-sorted
> > > EnumerableTableScan
> > >
> > > all the details are in this issue on HerdDB
> > >
> > > https://github.com/diennea/herddb/issues/262#issuecomment-423590573
> > >
> > > Cut and paste from the issue in the bottom of this email
> > >
> > > Any help is very appreciated, maybe some ring bells ....
> > >
> > > [ISSUE]
> > >
> > > query:
> > > SELECT * FROM license t0, customer c WHERE c.customer_id =
> t0.customer_id
> > >
> > > It seems that Calcite is planning a Merge Join, but the tables are not
> > > sorted according to the merge keys.
> > >
> > > "License" table:
> > > TABLE PK (non clustered): [license_id]
> > > COL: license_id serialPos: 0 (serialPos is the index of the colum for
> > > Calcite)
> > > COL: application serialPos: 1
> > > COL: creation serialPos: 2
> > > COL: data serialPos: 3
> > > COL: deleted serialPos: 4
> > > COL: modification serialPos: 5
> > > COL: signature serialPos: 6
> > > COL: customer_id serialPos: 7
> > >
> > > "Customer" table:
> > > TABLE PK (non clustered): [customer_id]
> > > COL: customer_id serialPos: 0
> > > COL: contact_email serialPos: 1
> > > COL: contact_person serialPos: 2
> > > COL: creation serialPos: 3
> > > COL: deleted serialPos: 4
> > > COL: modification serialPos: 5
> > > COL: name serialPos: 6
> > > COL: vetting serialPos: 7
> > >
> > > the join is on PK (non clustered) column of table customer,
> > > and the "customer_id" column of table 'license' which is not sorted
> > > naturally by 'customerid' (we do not have clustered indexes !!)
> > >
> > > This is the plan:
> > >
> > > EnumerableMergeJoin(condition=[=($7, $9)], joinType=[inner]): rowcount
> > > = 15.75, cumulative cost = {59.75 rows, 24.0 cpu, 0.0 io}, id = 114
> > > EnumerableTableScan(table=[[herd, license]]): rowcount = 15.0,
> > > cumulative cost = {15.0 rows, 16.0 cpu, 0.0 io}, id = 28
> > > EnumerableTableScan(table=[[herd, customer]]): rowcount = 7.0,
> > > cumulative cost = {7.0 rows, 8.0 cpu, 0.0 io}, id = 29
> > >
> > > EnumerableTableScan does not contain any information which tells that
> > > the Scan MUST be sorted according to the join keys (field 7 in
> > > "licence", and field 0 in "customer")
> > >
> > > Here in Calcite code the additional 'Collation' is lost as the
> > > "replace" does not contain any 'RelCollation', so the inputs of the
> > > join are not transformed
> > >
> > >
> > >
> https://github.com/apache/calcite/blob/2ab83e468d282a9428e533853aea5253816889fb/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoinRule.java#L78
> > >
> > > is it a bug in Calcite or in how we are passing data to Calcite ?
> > > Tables do not have any impliticit "collation" in HerdDB so we are not
> > > passing any 'RelCollation'
> > >
> > >
> > > Thank you
> > >
> > > Enrico
> > >
>