OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Buffer writers and seek method, NativeFile.is_seekable proposal


hi Paul,

We aren't talking about columnar data structures, but file interfaces,
i.e. the C++ classes in
https://github.com/apache/arrow/tree/master/cpp/src/arrow/io

- Wes
On Fri, Sep 7, 2018 at 2:56 PM Paul Rogers <par0328@xxxxxxxxx.invalid> wrote:
>
> Hi Wes,
>
> Intersting. Random-access writes is easy for fixed-width vectors. I'm curious how it might be done for variable-width vectors (VARCHAR, or arrays) given the structure of the offset vectors? Is the structure of the offset vector changing (to include, say, the start and length of each value?) This always seemed the stumbling block in prior discussions of this topic..
>
> Thanks,
> - Paul
>
>
>
>     On Friday, September 7, 2018, 11:40:07 AM PDT, Wes McKinney <wesmckinn@xxxxxxxxx> wrote:
>
>  hi Pearu,
>
> Sounds good to me. I'd always intended to add support for random
> access writes but have not done it yet.
>
> Thanks,
> Wes
> On Fri, Sep 7, 2018 at 3:51 AM Pearu Peterson
> <pearu.peterson@xxxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > In Arrow C++, various buffer writers define Seek method while in
> > pyarrow the seek is defined only for buffer readers (for instance,
> > NativeFile.seek references only rd_file).
> >
> > So, pyarrow relates 'seekable' strictly to 'readable' file property while
> > 'seekable' would make sense also when a file is 'writeable'. Non-seekable
> > files would be sockets or pipes but memory buffers like CudaBuffer can be
> > seekable.
> >
> > Is there any reason for relating 'seekable' to 'readable-only' within
> > pyarrow?
> >
> > I propose introducing is_seekable attribute to NativeFile in order to untie
> > 'seekable' property from 'readable' and 'writable' properties. What do you
> > think?
> >
> > Best regards,
> > Pearu
>