[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: C++: Covariant Return Types for Array::type()

I've updated the document with an extra section to include code snippets
for alternative designs to address concerns raised here on the mailing list.

MSVC 19 2017 supports covariant return types, and the code presented above
successfully compiled on gcc.godbolt.org using that compiler.

I would agree that the covariant_{shared, derived}_ptr classes themselves
are a bit complicated, and I wasn't particularly happy with the fact that
there's an extra pointer floating around. I've presented two alternatives
in the document that would eliminate the need for those classes. The first
solution makes the Array::type() function pure virtual and has sub-classes
implement without covariant return types. Instead, the sub-classes provide
a function with a different name that returns the concrete data type
sub-class (e.g. - ListArray::list_type()). The second solution uses C++
name hiding. In that design, the Array::type() function is not virtual and
calls through to a pure virtual function called get_type(). The subclass
would implement get_type() and simply return a shared pointer to a DataType
object, but would also name hide Array::type() with its own implementation
that returns a shared pointer the sub-class data type. While the name
hiding keeps the API cleaner (i.e. - you always call type()), I find that
it's one of the less used and more obscure features of C++ that tends to
confuse more novice to intermediate C++ programmers.

The design does not suggest removing the ArrayData class, but instead
removing the type_ member. It sounds like for the ArrayData class you are
trying to eschew the C++ type hierarchy and move to a more C oriented
design with type punning. I think the compromise here would be to have
ArrayData store the type id but not the shared pointer to the DataType. You
can take that approach one step further, and make the ArrayData a C struct
with its own API that would successfully compile with a C compiler. You
could then add a C++ wrapper that deals with that class and is used by the
various C++ classes in the Array/DataType type hierarchy. The benefit of
taking that approach is that you can easily inter-operate between C++ and
C. In other words, a C only arrow API would be using the richer C++ API
underneath, but the user of the API would only be able to obtain or pass
ArrayData C structs from/to the API.

This proposal was in no way trying to address the static / compile-time
problem. It was specifically targeted at making it easier to obtain
concrete data types in the dynamic schema setting when dealing with
arbitrary nesting of types. I do like the idea of a compile-time hierarchy
(e.g. - StaticListArray<StaticStructArray<StaticInt32Array,
StaticListArray<StaticInt32Array> >), but that will require a lot of
template meta-programming. If the covariant_{shared,derived}_ptr classes
above seem too complicated, I'm pretty sure that a compile-time hierarchy
will seem immensely more complicated. Bottom line: I would want to see a
concrete use case before implementing a compile-time type hierarchy with

On Sat, May 19, 2018 at 9:06 AM, Wes McKinney <wesmckinn@xxxxxxxxx> wrote:

> hi Josh,
> Thank you for putting together this document. These changes are
> interesting and worth giving some consideration. Some initial
> reactions / questions to get a discussion going:
> * How is MSVC support for covariant return types? It doesn't look like
> it's supported, which (I'm sorry to say) would be a deal breaker
> https://msdn.microsoft.com/en-us/library/8z9feath.aspx
> * I don't think creating a situation where arrow::DataType objects
> have to be copied is a good idea. Schemas and types will be reused,
> particular in a streaming data setting
> * Adding a covariant shared_ptr seems very complicated; it seems the
> primary benefit will be in code where we would currently have to
> introduce a dynamic visitor pattern. Could the same benefit be reaped
> by adding a _new_ virtual base method that returns a pointer or
> const-ref to the concrete type? This is a less invasive change
> * I think it is beneficial to have a simple internal data structure
> (i.e. ArrayData) that describes any Arrow array object
> From reading this document, I am wondering if it would not be worth
> thinking about creating a from-scratch set of array and type data
> structures intended for use in a static / compile-time setting. Since
> Arrow features "dynamic" schemas (in the sense that Avro is a more
> dynamic version of Protocol Buffers or Thrift), we inevitably will be
> dealing with the problem of dynamic dispatch from type information
> known only at runtime. I agree that it would be useful to eliminate
> usages of dynamic visitor pattern in the codebase where possible.
> - Wes
> On Sat, May 19, 2018 at 12:48 AM, joshuastorck@xxxxxxxxx
> <joshuastorck@xxxxxxxxx> wrote:
> > I've put together a proposal for using covariant return types in
> Array::type() in the C++ library. I wanted to get some feedback before
> putting together a PR in case it's too controversial or would require to
> much re-factoring of the code:
> >
> > https://docs.google.com/document/d/14mLO9uNIcrb-yTj_
> byBxwJeYbGy88JS9fQ6aOa46XRc/edit?usp=sharing
> >
> > It would be nice to use Google Doc's comment feature, but I would
> imagine it may be safer to memorialize the discussion here on the mailing
> list.