osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Relational algebra and signal processing


Perhaps you've thought of this already, but it sounds like streaming
relational algebra could be a good fit here.

https://calcite.apache.org/docs/stream.html
--
Michael Mior
mmior@xxxxxxxxxx


Le dim. 16 déc. 2018 à 18:39, Julian Feinauer <j.feinauer@xxxxxxxxxxxxxxxxx>
a écrit :

> Hi Calcite-devs,
>
> I just had a very interesting mail exchange with Julian (Hyde) on the
> incubator list [1]. It was about our project CRUNCH (which is mostly about
> time series analyses and signal processing) and its relation to relational
> algebra and I wanted to bring the discussion to this list to continue here.
> We already had some discussion about how time series would work in calcite
> [2] and it’s closely related to MATCH_RECOGNIZE.
>
> But, I have a more general question in mind, to ask the experts here on
> the list.
> I ask myself if we can see the signal processing and analysis tasks as
> proper application of relational algebra.
> Disclaimer, I’m mathematician, so I know the formals of (relational)
> algebra pretty well but I’m lacking a lot of experience and knowledge in
> the database theory. Most of my knowledge there comes from Calcites source
> code and the book from Garcia-Molina and Ullman).
>
> So if we take, for example, a stream of signals from a sensor, then we can
> of course do filtering or smoothing on it and this can be seen as a mapping
> between the input relation and the output relation. But as we usually need
> more than just one tuple at a time we lose many of the advantages of the
> relational theory. And then, if we analyze the signal, we can again model
> it as a mapping between relations, but the input relation is a “time
> series” and the output relation consists of “events”, so these are in some
> way different dimensions. In this situation it becomes mostly obvious where
> the main differences between time series and relational algebra are. Think
> of something simple, an event should be registered, whenever the signal
> switches from FALSE to TRUE (so not for every TRUE). This could also be
> modelled with MATCH_RECOGNIZE pretty easily. But, for me it feels
> “unnatural” because we cannot use any indices (we don’t care about the
> ratio of TRUE and FALSE in the DB, except for probably some very rough
> outer bounds). And we are lacking the “right” information for the optimizer
> like estimations on the number of analysis results.
> It gets even more complicated when moving to continuous valued signals
> (INT, DOUBLE, …), e.g., temperature readings or something.
> If we want to analyze the number of times where we have a temperature
> change of more than 5 degrees in under 4 hours, this should also be doable
> with MATCH_RECOGNIZE but again, there is no index to help us and we have no
> information for the optimizer, so it feels very “black box” for the
> relational algebra.
>
> I’m not sure if you get my point, but for me, the elegance of relational
> algebra was always this optimization stuff, which comes from declarative
> and ends in an “optimal” physical plan. And I do not see how we can use
> much of this for the examples given above.
>
> Perhaps, one solution would be to do the same as for spatial queries (or
> the JSON / JSONB support in postgres, [3]) to add specialized indices,
> statistics and optimizer rules. Then, this would make it more “relational
> algebra”-esque in the sense that there really is a possibility to apply
> transformations to a given query.
>
> What do you think? Do I see things to complicated or am I missing
> something?
>
> Julian
>
> [1]
> https://lists.apache.org/thread.html/1d5a5aae1d4f5f5a966438a2850860420b674f98b0db7353e7b476f2@%3Cgeneral.incubator.apache.org%3E
> [2]
> https://lists.apache.org/thread.html/250575a56165851ab55351b90a26eaa30e84d5bbe2b31203daaaefb9@%3Cdev.calcite.apache.org%3E
> [3] https://www.postgresql.org/docs/9.4/datatype-json.html
>
>