OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encoding options (delta, rle, ...) in pyarrow bindings


Uwe, Wes,

thanks so much. I completely forgot to say that I was asking about parquet.
It's good to know the current status though. I also didn't know that the
dictionary encoding already has some form of RLE.

@Uwe: Any ETA on delta encoding? Is the being worked on or are other things
more important ATM? I am not asking to generate pressure but out of
curiosity. I appreciate that this is an open source project and if I need
it I can just jump in and do it myself.

Thanks again and have a great day,
Sebastian


Am Fr., 2. Nov. 2018 um 14:27 Uhr schrieb Wes McKinney <wesmckinn@xxxxxxxxx
>:

> Hi Sebastian -- Uwe is referring to Parquet files. We don't yet have
> in-memory RLE or Delta encoding in the Arrow columnar format. I suspect
> this will eventually be added as it can be quite important to improve
> in-memory query execution performance.
>
> Wes
>
> On Fri, Nov 2, 2018, 2:18 PM Uwe L. Korn <uwelk@xxxxxxxxxx wrote:
>
> > Hello Sebastian,
> >
> > currently you can only switch between plain and
> > dictionary-encoding-combined-with-run-length encoding using the
> > `use_dictionary` flag on
> >
> https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table
> > . Other encoding are yet only implemented on the read path, we cannot
> write
> > delta encodings yet.
> >
> > Uwe
> >
> > On Fri, Nov 2, 2018, at 12:53 PM, Sebastian Himberger wrote:
> > > Hi,
> > >
> > > I hope this is the right list. I couldn't find a "users" list on the
> > > website so please forgive me if I am interrupting here.
> > >
> > > I am developing an application using the pyarrow module. By reading
> > through
> > > the documents I couldn't find a way to specify an encoding like delta
> or
> > > run length to a column. Is this not supported yet or am I missing
> > something?
> > >
> > > Thanks so much,
> > > Sebastian
> >
>