|
|
Sponsor |
INTERSECT and EXCEPT Design: msg#00320apache.db.derby.devel
Attached is an HTML document describing the design of the INTERSECT and EXCEPT operators, which I contributed to Derby. Perhaps it can be posted on the Derby web site in the Papers/Derby Engine section. Jack Klebanoff Intersect & Except DesignJack KlebanoffFeb. 22 2005 IntroductionThis paper describes the implementation of the INTERSECT and EXCEPT operators. This paper assumes basic familiarity with SQL and the language (compiler) portion of Derby. The INTERSECT and EXCEPT operators operate on table expressions producing the intersection and difference, respectively. The syntax is (roughly):
queryExpression INTERSECT [ALL] queryExpression By default these operators remove duplicates, which can occur if there are duplicates in the inputs. If ALL is specified then duplicates are not removed. If t1 has m copies of row R and t2 has n copies then t1 INTERSECT ALL t2 returns min(m,n) copies of R, and t1 EXCEPT ALL t2 returns max( 0, m-n) copies of R. The EXCEPT operator has the same precedence as UNION. INTERSECT has higher precedence. The implementation is spread across several classes, primarily
ExecutionThe INTERSECT and EXCEPT operations are executed in a similar fashion, by class SetOpResultSet. The two inputs are sorted separately. The sort key consists of all the columns. Then SetOpResultSet scans the two inputs simultaneously. The INTERSECT operation outputs approximately any row from its left input that is also found in its right output. The EXCEPT operation outputs approximately any row from its left input that is not found in its right output. Handling of duplicates complicates the picture a little, which is the reason for the "approximately" caveat. However the scans proceed in a strictly forward direction; there is no need for backtracking.If the left and right inputs have N and M rows respectively the sorts take time O(N*log(N) + M*log(M)). The final scan takes time O(N + M). So the time for the whole operation is O(N*log(N) + M*log(M)). Alternative Execution PlansOther implementations are possible.
The current implementation was chosen because it always provides at least decent speed and memory utilization, and in many, though certainly not all cases, it is the best implementation. We could have provided several implementations and let the optimizer choose the best, but this does not seem worthwhile for operations that are seldom used. Binding, Optimization, and Code GenerationThe IntersectOrExceptNode class handles compilation of both INTERSECT and EXCEPT operations, because they are implemented similarly. We do very little in the way of optimization because we have only implemented one execution plan; the optimizer has nothing to choose from.The INTERSECT and EXCEPT operators are bound much like the UNION operator. The bind methods are all found in super class SetOperatorNode, which is shared with UnionNode. IntersectOrExceptNode generates OrderByNodes for its inputs at the start of optimization, in the preprocess phase. Any column ordering can be used for the sorts, as long as the same one is used for both the left and right inputs. IntersectOrExceptNode tries to be a little clever about picking the column ordering for the sorts. If the INTERSECT or EXCEPT output must be ordered then IntersectOrExceptNode uses the ORDER BY columns as the most significant part of the sort key for its inputs. Any columns not in the ORDER BY list are tacked on to the least significant part of the sort keys of the inputs. This ensures that the output of the INTERSECT or EXCEPT will be properly ordered without an additional sort step. The architecture of the Derby optimizer makes it difficult to do further optimizations. SelectNode processing requires that order by lists be pushed down to them at the start of preprocessing. If an input to INTERSECT or EXCEPT is a SELECT (a common case) then IntersectOrExceptNode has to decide whether it needs its inputs ordered before it calls the preprocess method of its inputs. That means that it must chose its execution plan at the start of the optimization process, not as the result of the optimization process. Code generation is straighforward. It generates code that invokes the ResultSetFactory.getSetOpResultSet method. ParserThe parser, in java/engine/org/apache/derby/impl/sql/compile/sqlgrammar.jj, implements the EXCEPT, INTERSECT, and UNION operators in the queryExpression method.The UNION and EXCEPT operators have the same precedence. The INTERSECT operator has higher precedence, so t1 UNION ALL t2 UNION t3is equivalent to (t1 UNION ALL t2)UNION t3and t1 EXCEPT t2 INTERSECT t3is equivalent to t1 EXCEPT (t2 INTERSECT t3) This precedence and associativity is implemented in the structure of queryExpression. The higher precedence of the INTERSECT operator is implemented by building queryExpression out of nonJoinQueryTerm. A nonJoinQueryTerm consists of one or more nonJoinQueryPrimaries connected by INTERSECT operators. A queryExpression consists of one or more nonJoinQueryTerms connected by EXCEPT or UNION operators.
Our parser is a recursive descent parser. Recursive descent parsers want recursion on the
right. Therefore they do not handle SQL's operator associativity naturally. The natural grammar for
queryExpression is queryExpression ::= nonJoinQueryTerm | queryExpression UNION [ALL] nonJoinQueryTerm | queryExpression EXCEPT [ALL] nonJoinQueryTermThis captures the correct associativity of the UNION and EXCEPT operators, but we cannot use it because it has recursion on the left. Therefore our parser uses the following grammar:
We have right recursion, but each queryExpression has to know whether it is a simple _expression_ or
the right hand side of a UNION or INTERSECT operation. The nonJoinQueryTerm method uses a similar
trick to implement SQL standard associativity while using right recursion. INTERSECT associativity matters
when INTERSECT ALL is mixed with plain INTERSECT.
The creation of union, intersect, or except query tree nodes is done in the nonJoinQueryTerm method.
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | [PATCH] BackingStoreHashtable, Jack Klebanoff |
|---|---|
| Next by Date: | Re: [PATCH] Derby-107, Phase II, Satheesh Bandaram |
| Previous by Thread: | [PATCH] BackingStoreHashtable, Jack Klebanoff |
| Next by Thread: | Re: INTERSECT and EXCEPT Design, RPost |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
Free MagazinesCisco NewsReceive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business. subscribe Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field. subscribe The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business. subscribe Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company. subscribe Total Telecom Total Telecom is "The Economist of the communications industry". subscribe |
Home | sitemap
| advertise | OSDir is
an inevitable website.
|