Re: Suitability of RelJson format for long-term storage
I think it would be useful to serialize the table’s row-type (column names and types, in order) as part of the RelJson. Then the reader could adapt if the table’s row-type has changed since the RelJson was written.
If the row-type of each leaf is known, then column names can be deduced throughout the tree, and column references by name have the same information content as column ordinals by reference.
> On May 31, 2018, at 12:56 PM, Marc Prud'hommeaux <marc@xxxxxxxxxx> wrote:
> Thanks for the encouragement. After some further reflection, one concern I have for using the built-in serialization format for long-term storage is that column references are stored by their ordinal position rather than their name, which could mess things up if the underlying table's columns are changed over time.
> This seems to be pretty deeply baked in, but please let me know if I am missing some option for encoding the column references by name rather than index.
>> On May 31, 2018, at 11:50, Julian Hyde <jhyde@xxxxxxxxxx> wrote:
>> I support the idea of making it stable. It will take some work: at a minimum, documentation and a version id, then later some transformers to convert version X to version Y.
>>> On May 31, 2018, at 8:16 AM, Michael Mior <mmior@xxxxxxxxxx> wrote:
>>> AFAIK, no one is using this for long-term storage and no one is expecting
>>> the format to stable. That said, I personally would be open to the idea of
>>> stabilizing the format. Given the format is fairly simple, one approach
>>> would be to use something like JSON Schema and then have some tests to
>>> validate that the output corresponds to the schema.
>>> Michael Mior
>>> Le jeu. 31 mai 2018 à 11:09, Marc Prud'hommeaux <marc@xxxxxxxxxx> a écrit :
>>>> I am developing an application that allows end users to interactively
>>>> construct and execute relational expressions that span multiple data
>>>> sources using Calcite. My current implementation utilizes my own relational
>>>> algebra JSON format which I then convert to a RelNode using a RelBuilder.
>>>> It would vastly simplify my project if I could just use Calcite's own
>>>> RelJson format to construct and persist relational expressions, but I am
>>>> concerned that the format is both undocumented, and, aside from
>>>> RelWriterTest.java, does not have much in the way of future guarantees that
>>>> the format will remain stable.
>>>> Is the RelJson format intended the be used for long-term storage? Are
>>>> there any known applications that are using this as a serialization format
>>>> for their relational expressions?
>>>> If the consensus is that this format should be stable, then I can do some
>>>> work towards documenting it, as well as implementing some additional test
>>>> cases to ensure that RelNodes that are round-tripped through JSON
>>>> serialization maintain fidelity.