Re: Re: Exception-handling in built-in functions
First of all, thanks Hongze for working on this!
CALCITE-525 asks for throwing exceptions in certain circumstances and I
think the fix should focus exclusively on this.
Regarding the discussion about additional actions in case of an error
(omitting rows or logging) I believe it is useful but I think it could be
achieved by other means such as overloading the operators (using
SqlOperatorTable). Omitting rows or logging is not a very common policy to
become part of the core project. I think that projects relying on Calcite
and need this kind of functionality could provide their own operator
overloads relatively easy.
Providing additional built in functions, such as CATCH_ERROR, might become
handy in some situations but again it is something that individual projects
could easily plugin as a user defined function.
Note that, I am not against the existing PR but I would like to highlight
an alternative option.
Στις Πέμ, 18 Οκτ 2018 στις 1:36 μ.μ., ο/η notifyzhz@xxxxxxx <
> Thanks for opening this discussion and thank you all for your replies.
> The reason why I would like to help solve CALCITE-525 is because I agree
> that an function error should not easily break a large job, this "job"
> could indicate a large ad-hoc query or a query on a stream, or like what
> Julian said, a ETL job.
> According to some tests I had run about Calcite streaming, a simple
> division by zero error could break the whole query job. Say, If Calcite
> does not provide a option to silence the error, users should spend extract
> daily time on checking whether their streaming job is alive if their SQL
> includes a "/" operator.
> The solutions of B and C are different: B is to ignore and drop a whole
> row when error happens, and C is to make the failed call return null value.
> In C, the major reason to introduce "CATCH_ERROR" function is to represent
> the error handling on Rel, whether to enable the function in Parser is not
> At first I personally prefer C, but as C does need more code change than
> B, I don't have a very strong inclination between B and C now.
> At last, I agree that making enough discussion on introducing new feature
> is very important. If we don't really need such a feature now, I think I
> can help whenever it is considered helpful in future.
> From: Jesus Camacho Rodriguez
> Date: 2018-10-18 10:26
> To: dev@xxxxxxxxxxxxxxxxxx
> Subject: Re: Exception-handling in built-in functions
> I do not believe there is enough reason to block CALCITE-525. IMO,
> CALCITE-525 describes a problem that some Calcite users are facing and a
> reasonable plugable solution. We should not be vetoing such a feature
> without providing viable alternatives. (Without having checked the specific
> implementation details, I prefer approach B described below as it is less
> intrusive. And A should be fixed in a different issue.)
> I agree with Julian´s idea that Calcite is not a RDBMS such as Oracle or
> Postgres, and it has always tried to provide flexibility to underlying
> engines, one of the reasons for its wide adoption. In addition, systems are
> not forced to use this feature, it is tagged as experimental and by default
> we are still running in same mode. I believe that is sufficient.
> Personally, I will not be happy if a developer feels compelled to fork
> Calcite or stop contributing code because we do not accept features such as
> the one described there.
> On 10/17/18, 5:17 PM, "Michael Mior" <mmior@xxxxxxxxxx> wrote:
> My apologies for missing this thread a couple days ago. (Thanks for
> it.) Here's my two cents: taking care of contributors to the project is
> just as important (if not more important) than taking care of the
> code. I'm
> not saying we should merge terrible code just to keep each other
> happy, but
> I don't think that's the case here. If anyone writes some code which
> disagree with, you should be free to voice your disagreement. However,
> especially when the code is from a core contributor and the argument
> focuses on potential future problems, I think it's important to
> that people who have shown dedication to the project over the years are
> very likely to be around and willing to fix these problems as they
> Code which turns out to cause problems can always be deleted, reverted,
> refactored, etc. It's much harder to back out when a contributor is
> out or interpersonal conflicts get heated.
> Michael Mior
> Le mer. 17 oct. 2018 à 14:58, Julian Hyde <jhyde@xxxxxxxxxx> a écrit :
> > Vladimir,
> > You’ve made your points. And I hear them.
> > However I get the impression that you are not open to persuasion.
> > means that I am wasting my time trying to reach consensus with you.
> > means that people win arguments not on merit, but based upon who is
> > persistent.
> > Here is my point. Calcite's goal is not to re-create what Oracle or
> > PostgreSQL did ten years later. It is a platform that allows people
> > write their own data engine. If they want to redefine the “+”
> operator such
> > that 2 + 2 returns 5, the platform should allow it.
> > Certainly if they want to engineer their own error-handling
> strategy, we
> > should let them do it. I didn’t have the energy to find an example
> of a SQL
> > engine that discards rows with divide-by-zero errors, but I believe
> > is one. I suspect that both Broadbase, SQLstream and Hive, three SQL
> > engines that I have worked on that performed ETL-like tasks, all had
> > capability. And all ETL tools have very flexible error-handling
> > They are not SQL-based, but Calcite is not exclusively for SQL
> > I have been designing and building world-class data engines for 30
> > Please take me on good faith that a flexible error-handing strategy
> is a
> > good idea. Don’t force me to bicker over email for hours and hours.
> When a
> > long discussion leads to the rejection of a contribution, I get
> > considerably closer to burning out.
> > Julian
> > > On Oct 17, 2018, at 11:36 AM, Vladimir Sitnikov <
> > sitnikov.vladimir@xxxxxxxxx> wrote:
> > >
> > > Juilian>Hey, folks. We need your input here.
> > >
> > > Here are my thoughts:
> > > 1) I think the features we add should have at least some level of
> > > consistency
> > > 2) It is much safer to adopt well-known features rather than be
> > in
> > > the field. I do not mean we must wait for someone else implement
> and try
> > > out a feature, however I would not rush for implementing a feature
> > > no-one else explored.
> > >
> > > CALCITE-525 has two key points:
> > > A) Current implementation of enumerable factors code like 0/0 to a
> > > field of a generated code. It causes the generated code to fail at
> > > time even before the query is executed.
> > > Of course that is a bug, and I'm even inclined to remove that
> > > fields"
> > >
> > > B) Someone (Hongze? Juilan?) suggest to implement a mode to
> > ignore
> > > the error (e.g. by ignoring the row or by returning default value).
> > > First of all, I don't think "ignore the row" kind of processing
> would do
> > > any good to the user since it would not be possible to predict the
> > output.
> > > "ignore the row" is very tricky when join/semijoin/antijoin is
> > >
> > > I'm sure OracleDB and PostgreSQL do NOT have such "features", so I
> > we
> > > should not rush for it.
> > >
> > > C) Hongze suggests CATCH_ERROR(1 / 0 EMPTY ON ERROR) or
> CATCH_ERROR(1 /
> > > 0) EMPTY ON ERROR kind of functions.
> > > That enables to confine the scope of the error, however I don't
> think it
> > > would be used often (does that mean one would have to wrap each
> > > expression?), and this "catch error" would be non-trivial to
> propagate to
> > > the downstream executors.
> > > On top of that, we might end up inventing full-blown
> > try-catch-catch-catch
> > > syntax.
> > >
> > > I truly see no business value in implementing B/C, however I do
> see the
> > > pain it would introduce. It would complicate Calcite maintenance.
> > could
> > > silently produce wrong results, and I'm sure we don't want get
> > out
> > > of thin air.
> > >
> > > Vladimir