osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sessionizing raw events / does Calcite support ARRAY_AGG?


Super! I was not aware of COLLECT. That sounds like an even better fit.
Great timing on the release, for my sake :-)

Kenn

On Sat, Dec 15, 2018, 13:51 Julian Hyde <jhyde@xxxxxxxxxx wrote:

> As of https://issues.apache.org/jira/browse/CALCITE-2224 (to be
> released shortly in 1.18) Calcite supports the WITHIN GROUP clause
> (that allows you to specify the order in which values are supplied to
> an aggregate function) and the COLLECT aggregate function (similar to
> ARRAY_AGG but returns a nested relation rather than an array).
>
> LISTAGG (an aggregate function that concatenates its arguments into a
> string) and ARRAY_AGG are not implemented but would be straightforward
> follow-ups to that work.
>
> As I noted in CALCITE-2224, the SQL standard says that ARRAY_AGG may
> optionally have an ORDER BY clause inside its parentheses, e.g.
>
>   SELECT deptno, ARRAY_AGG(empno ORDER BY sal DESC) AS emps
>   FROM emp
>   GROUP BY deptno
>
> I think WITHIN GROUP could and should be used instead, viz
>
>   SELECT deptno, ARRAY_AGG(empno) WITHIN GROUP (ORDER BY sal DESC) AS emps
>   FROM emp
>   GROUP BY deptno
>
> because that is more consistent with other aggregate functions, and
> would allow us to supply ARRAY_AGG without extra parser work.
>
> Julian
>
>
> On Fri, Dec 14, 2018 at 5:58 PM Kenneth Knowles <kenn@xxxxxxxxxx> wrote:
> >
> > Hello!
> >
> > My use case is sessionizing raw events without an aggregation function.
> > Approximate code that I tried out:
> >
> > SELECT ARRAY_AGG(ROW(...))
> > FROM ...
> > GROUP BY SESSION(...)
> >
> > (followed by UNNEST to get the raw events, tagged with session info, back
> > out into a stream)
> >
> > I get a parser error on the paren after ARRAY_AGG, presumably because it
> is
> > an identifier treated as a column name?
> >
> > So I was digging through Calcite's code and my conclusion is that there
> is
> > no implementation of ARRAY_AGG. Is there an alternative? Is there another
> > way to use Calcite's streaming extensions to do sessionization of raw
> > events?
> >
> > Kenn
>