[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Migrating Beam SQL to Calcite's code generation

Raise this thread.
Seems there're more changes in the backend on how a FUNCTION is executed in the backend, as noticed in #6996:
1. BeamSqlExpression and BeamSqlExpressionExecutor are removed;
2. BeamSqlExpressionEnvironment are removed;

1. for Calcite defined FUNCTIONS, it uses Calcite generated code (which is great and duplicate work is worthless);
2. no way to access Beam context now;

For #2, I think we need to find a way to expose it, at least our UDF/UDAF should be able to access it to leverage the advantages of Beam module.

Any comments?

On Wed, Sep 19, 2018 at 2:55 PM Rui Wang <ruwang@xxxxxxxxxx> wrote:
This is a so exciting change!

Since we cannot mix current implementation with Calcite code generation, is there any case that Calcite code generation does not support but our current implementation supports, so switching to Calcite code generation will have some impact to existing usage?


On Wed, Sep 19, 2018 at 11:53 AM Andrew Pilloud <apilloud@xxxxxxxxxx> wrote:
To follow up on this, the PR is now in a reviewable state and I've added more tests for FLOOR and CEIL. Both work with a more extensive set of arguments after this change. There are now 4 outstanding calcite PRs that get all the tests passing.

Unfortunately there is no easy way to mix our current implementation and using Calcite's code generator.


On Mon, Sep 17, 2018 at 3:22 PM Mingmin Xu <mingmxus@xxxxxxxxx> wrote:
Awesome work, we should call Calcite operator functions if available.

I haven't get time to read the PR yet, for those impacted would keep existing implementation. One example is, I notice FLOOR/CEIL only supports months/years recently which is quite a surprise to me.


On Mon, Sep 17, 2018 at 3:03 PM Anton Kedin <kedin@xxxxxxxxxx> wrote:
This is pretty amazing! Thank you for doing this!


On Mon, Sep 17, 2018 at 2:27 PM Andrew Pilloud <apilloud@xxxxxxxxxx> wrote:
I've adapted Calcite's EnumerableCalc code generation to generate the BeamCalc DoFn. The primary purpose behind this change is so we can take advantage of Calcite's extensive SQL operator implementation. This deletes ~11000 lines of code from Beam (with ~350 added), significantly increases the set of supported SQL operators, and improves performance and correctness of currently supported operators. Here is my work in progress: https://github.com/apache/beam/pull/6417

There are a few bugs in Calcite that this has exposed:

Fixed in Calcite master:
  • CALCITE-2321 - The type of a union of CHAR columns of different lengths should be VARCHAR
  • CALCITE-2447 - Some POWER, ATAN2 functions fail with NoSuchMethodException
Pending PRs:
  • CALCITE-2529 - linq4j should promote integer to floating point when generating function calls
  • CALCITE-2530 - TRIM function does not throw exception when the length of trim character is not 1(one)
More work:
  • CALCITE-2404 - Accessing structured-types is not implemented by the runtime
  • (none yet) - Support multi character TRIM extension in Calcite
I would like to push these changes in with these minor regressions. Do any of these Calcite bugs block this functionality being adding to Beam?