Thank you for your reply. I actually found your blog post regarding this topic and browsed through it, but it did not yield the answer I was looking for. In fact, it seems impossible to do what I wish to do without defining a UDA for this specific use case -- something that is not practical to do when all of my queries use 'group by'.
For example, I ave a query like this:
select sum(a), avg(a), min(a), max(a), MY_UDF(my_set_column) from my_table group by a;
I would hope that using a UDF for my_set_column would allow me to combine all of the my_set_columns passed in via group by, but I cannot pass state to the UDF. A UDA can accept state, but that would require me rewriting the whole query to be:
select MY_UDA(a, my_set_column) from my_table;
Additionally, I would need a separate UDA for each of the different group by clauses. Is there no way around this? I would really like to be able to simply add a data column of type set<bigint> and then get all of the unique members in this set across an aggregation.