[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SQL] Cross Join Operation

Calcite does not have the concept of a "CROSS JOIN". It shows up in the plan as a LogicalJoin with condition=[true]. We could try rejecting the cross join at the planning stage by returning null for them in BeamJoinRule.convert(), which might result in a different plan. But looking at your query, you have a cross join unless the where clause on the inner select contains a row from the outer select.


On Tue, May 15, 2018 at 9:15 AM Kenneth Knowles <klk@xxxxxxxxxx> wrote:
The logical plan should show you where the cross join is needed. Here is where it is logged: https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L150

(It should probably be put to DEBUG level)

If I look at the original template, like https://github.com/gregrahn/tpcds-kit/blob/master/query_templates/query9.tpl I see conditions "[RC.1]". Are those templates expected to be filled with references to the `reason` table, perhaps? How does that change things?

I still think it would be good to support CROSS JOIN if we can - the problem of course is huge data size, but when one side is small it would be good for it to work simply.


On Tue, May 15, 2018 at 7:41 AM Kai Jiang <jiangkai@xxxxxxxxx> wrote:
Hi everyone,

To prove the idea of GSoC project, I was working on some simple TPC-DS queries running with given generated data on direct runner. query example

The example is executed with TPC-DS query 9. Briefly, Query 9 uses case when clauses to select 5 counting numbers from store_sales (table 1). In order to show those result numbers, case when clause inside one select clause. In short, it looks like:
CASE WHEN ( SELECT count(*)  FROM  table 1 WHERE..... )
THEN condition 1
ELSE condition 2,
FROM table 2

IIUC, this query doesn't need join operation on table 1 and table 2 since outside select clause doesn't need to interfere with table 1. 
But, the program shows it does and throws errors message said "java.lang.UnsupportedOperationException: CROSS JOIN is not supported". (error message detail)

To make the query work, I am wondering where I can start with:
1. see logic plan?
Will logic plan explain why the query need CROSS JOIN? 

2. cross join support? 
I checked all queries in TPC-DS benchmark. Almost every query uses cross join. It is an important feature needs to implement. Unlike other join, it consumes a lot of computing resource. But, I think we need cross join in the future. and support both in join-library? I noticed James has open BEAM-2194 for supporting cross join.

Looking forward to comments!