[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Aggregation of Set Data Type

I was able to make this work using the following UDF/UDA:

CREATE OR REPLACE FUNCTION agg_set_func(state tuple<int, set<bigint>>, val set<bigint>) CALLED ON NULL INPUT RETURNS tuple<int, set<bigint>> LANGUAGE java AS 
    if (val == null) {
    return state;
    Set<Long> s = state.getSet(1, Long.class);
    state.setSet(1, s);
return state;

CREATE OR REPLACE FUNCTION agg_set_func_final(state tuple<int, set<bigint>>) CALLED ON NULL INPUT RETURNS set<bigint> LANGUAGE java AS 
    return state.getSet(1, Long.class);

CREATE AGGREGATE agg_set(set<bigint>) 
SFUNC agg_set_func
STYPE tuple<int, set<bigint>>
FINALFUNC agg_set_func_final
INITCOND (0,{});

On Thu, Oct 25, 2018 at 3:55 PM Joseph Wonesh <joseph.wonesh@xxxxxxxxxxxxxx> wrote:
Thank you for your reply. I actually found your blog post regarding this topic and browsed through it, but it did not yield the answer I was looking for. In fact, it seems impossible to do what I wish to do without defining a UDA for this specific use case -- something that is not practical to do when all of my queries use 'group by'.

For example, I ave a query like this:

select sum(a), avg(a), min(a), max(a), MY_UDF(my_set_column) from my_table group by a;

I would hope that using a UDF for my_set_column would allow me to combine all of the my_set_columns passed in via group by, but I cannot pass state to the UDF. A UDA can accept state, but that would require me rewriting the whole query to be:

select MY_UDA(a, my_set_column) from my_table;

Additionally, I would need a separate UDA for each of the different group by clauses. Is there no way around this? I would really like to be able to simply add a data column of type set<bigint> and then get all of the unique members in this set across an aggregation. 

On Tue, Oct 23, 2018 at 1:44 PM DuyHai Doan <doanduyhai@xxxxxxxxx> wrote:
You will need to use user defined aggregates for this 

Le 23 oct. 2018 16:46, "Joseph Wonesh" <joseph.wonesh@xxxxxxxxxxxxxx> a écrit :
Hello all,

 I am trying to aggregate rows which each contain a column of Set<BIGINT>. I would like the result to contain the sum of all sets, where null would be equivalent to the empty set. I expected a query like: "select sum(my_set_column) from my_table group by my_key_column" to do this, but the set type is not supported by this aggregate. Does anyone know of a way to aggregate this using existing cassandra built-ins? Thanks!

This message is private and confidential. If you have received message in error, please notify us and remove from your system. 

This message is private and confidential. If you have received message in error, please notify us and remove from your system.