OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Help with EnumerableMergeJoinRule which is losing a RelCollection trait


Hi,
We found a strange behaviour in an execution plan, basially we have an
EnumerableMergeJoin which has as input two non-sorted
EnumerableTableScan

all the details are in this issue on HerdDB

https://github.com/diennea/herddb/issues/262#issuecomment-423590573

Cut and paste from the issue in the bottom of this email

Any help is very appreciated, maybe some ring bells ....

[ISSUE]

query:
SELECT * FROM license t0, customer c WHERE c.customer_id = t0.customer_id

It seems that Calcite is planning a Merge Join, but the tables are not
sorted according to the merge keys.

"License" table:
TABLE PK (non clustered): [license_id]
COL: license_id serialPos: 0 (serialPos is the index of the colum for Calcite)
COL: application serialPos: 1
COL: creation serialPos: 2
COL: data serialPos: 3
COL: deleted serialPos: 4
COL: modification serialPos: 5
COL: signature serialPos: 6
COL: customer_id serialPos: 7

"Customer" table:
TABLE PK (non clustered): [customer_id]
COL: customer_id serialPos: 0
COL: contact_email serialPos: 1
COL: contact_person serialPos: 2
COL: creation serialPos: 3
COL: deleted serialPos: 4
COL: modification serialPos: 5
COL: name serialPos: 6
COL: vetting serialPos: 7

the join is on PK (non clustered) column of table customer,
and the "customer_id" column of table 'license' which is not sorted
naturally by 'customerid' (we do not have clustered indexes !!)

This is the plan:

EnumerableMergeJoin(condition=[=($7, $9)], joinType=[inner]): rowcount
= 15.75, cumulative cost = {59.75 rows, 24.0 cpu, 0.0 io}, id = 114
EnumerableTableScan(table=[[herd, license]]): rowcount = 15.0,
cumulative cost = {15.0 rows, 16.0 cpu, 0.0 io}, id = 28
EnumerableTableScan(table=[[herd, customer]]): rowcount = 7.0,
cumulative cost = {7.0 rows, 8.0 cpu, 0.0 io}, id = 29

EnumerableTableScan does not contain any information which tells that
the Scan MUST be sorted according to the join keys (field 7 in
"licence", and field 0 in "customer")

Here in Calcite code the additional 'Collation' is lost as the
"replace" does not contain any 'RelCollation', so the inputs of the
join are not transformed

https://github.com/apache/calcite/blob/2ab83e468d282a9428e533853aea5253816889fb/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoinRule.java#L78

is it a bug in Calcite or in how we are passing data to Calcite ?
Tables do not have any impliticit "collation" in HerdDB so we are not
passing any 'RelCollation'


Thank you

Enrico