OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help with EnumerableMergeJoinRule which is losing a RelCollection trait


Hi Enrico,

I think a bit more context would help.

What kind of org.apache.calcite.schema.Table are you using in your Schema?
I suppose you already looked but just in case are there any kind of
statistics on these tables.
I've seen a similar case where the statistics of the table were declaring
that some columns were sorted without this being true.
I just want to make sure that this is not the case here.

Best,
Stamatis


Στις Σάβ, 22 Σεπ 2018 στις 11:43 π.μ., ο/η Enrico Olivelli <
eolivelli@xxxxxxxxx> έγραψε:

> Hi,
> We found a strange behaviour in an execution plan, basially we have an
> EnumerableMergeJoin which has as input two non-sorted
> EnumerableTableScan
>
> all the details are in this issue on HerdDB
>
> https://github.com/diennea/herddb/issues/262#issuecomment-423590573
>
> Cut and paste from the issue in the bottom of this email
>
> Any help is very appreciated, maybe some ring bells ....
>
> [ISSUE]
>
> query:
> SELECT * FROM license t0, customer c WHERE c.customer_id = t0.customer_id
>
> It seems that Calcite is planning a Merge Join, but the tables are not
> sorted according to the merge keys.
>
> "License" table:
> TABLE PK (non clustered): [license_id]
> COL: license_id serialPos: 0 (serialPos is the index of the colum for
> Calcite)
> COL: application serialPos: 1
> COL: creation serialPos: 2
> COL: data serialPos: 3
> COL: deleted serialPos: 4
> COL: modification serialPos: 5
> COL: signature serialPos: 6
> COL: customer_id serialPos: 7
>
> "Customer" table:
> TABLE PK (non clustered): [customer_id]
> COL: customer_id serialPos: 0
> COL: contact_email serialPos: 1
> COL: contact_person serialPos: 2
> COL: creation serialPos: 3
> COL: deleted serialPos: 4
> COL: modification serialPos: 5
> COL: name serialPos: 6
> COL: vetting serialPos: 7
>
> the join is on PK (non clustered) column of table customer,
> and the "customer_id" column of table 'license' which is not sorted
> naturally by 'customerid' (we do not have clustered indexes !!)
>
> This is the plan:
>
> EnumerableMergeJoin(condition=[=($7, $9)], joinType=[inner]): rowcount
> = 15.75, cumulative cost = {59.75 rows, 24.0 cpu, 0.0 io}, id = 114
> EnumerableTableScan(table=[[herd, license]]): rowcount = 15.0,
> cumulative cost = {15.0 rows, 16.0 cpu, 0.0 io}, id = 28
> EnumerableTableScan(table=[[herd, customer]]): rowcount = 7.0,
> cumulative cost = {7.0 rows, 8.0 cpu, 0.0 io}, id = 29
>
> EnumerableTableScan does not contain any information which tells that
> the Scan MUST be sorted according to the join keys (field 7 in
> "licence", and field 0 in "customer")
>
> Here in Calcite code the additional 'Collation' is lost as the
> "replace" does not contain any 'RelCollation', so the inputs of the
> join are not transformed
>
>
> https://github.com/apache/calcite/blob/2ab83e468d282a9428e533853aea5253816889fb/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoinRule.java#L78
>
> is it a bug in Calcite or in how we are passing data to Calcite ?
> Tables do not have any impliticit "collation" in HerdDB so we are not
> passing any 'RelCollation'
>
>
> Thank you
>
> Enrico
>