osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to create list<list<byte>> structure


Thanks a lot, Wes.

On Fri, Aug 24, 2018 at 12:17 AM Wes McKinney <wesmckinn@xxxxxxxxx> wrote:

> hi Wenjian -- I am not an expert in the Java library. Perhaps Bryan,
> Li, Jacques, or Sidd can point you in the right direction. You can
> take a look at the Dremio codebase to see more examples of Arrow in
> action
>
> https://github.com/dremio/dremio-oss
>
> - Wes
>
> On Tue, Aug 14, 2018 at 10:08 PM, Xu,Wenjian <zeroxwj@xxxxxxxxx> wrote:
> > Hi Wes,
> >
> > Thank you for your kind help.
> >
> > Actually I am working on the Java UDF iterating the *array<string>* in
> SQL
> > language.
> >
> > I understand that , in order to represent *array<string>* in Arrow
> format, I
> > could use ListVector with VarCharVector as the inner list. My question
> is,
> > how to efficiently access the all the elements (i.e., each byte[] as
> > string)?
> >
> > By checking the test code:
> >
> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestListVector.java
> >
> > one option is to use ListVector.getObject(int index) to get each
> > ArrayList<Text>, and then iterate each element in ArrayList<Text>. But
> this
> > method is expensive because:
> >
> > 1) it calls VarCharVector.get(int index) which involves memory copy
> > 2) it calls Text.set(byte[]) which assemble the Text from byte array.
> >
> > My goal is just to retrieve each byte[] and do some filtering. Is there
> any
> > other less expensive method to achieve my goal? For example,
> > VarCharVector.get(int index, NullableVarCharHolder holder) seems to be a
> > less-expensive operation. But how to use this method in my case?
> >
> > Thanks again.
> >
> > Best regards,
> > Wenjian
> >
> >
> >
> >
> > On Wed, Aug 15, 2018 at 3:19 AM Wes McKinney <wesmckinn@xxxxxxxxx>
> wrote:
> >>
> >> hi Wenjian,
> >>
> >> In C++ you can use ListBuilder together with UInt8Builder. There are
> >> examples of using ListBuilder you can look at in
> >> src/arrow/array-test.cc.
> >>
> >> For Java you might want to have a look at how Spark SQL converts its
> >> Array<T> types into Arrow (there should be other examples in the Java
> >> unit test suite, too):
> >>
> >>
> >>
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
> >>
> >> - Wes
> >>
> >> On Mon, Aug 13, 2018 at 6:00 AM, Xu,Wenjian <zeroxwj@xxxxxxxxx> wrote:
> >> > Hi,
> >> >
> >> > If I want to create list<list<byte>> structure (as shown in
> >> > https://arrow.apache.org/docs/memory_layout.html), what class(es) do
> I
> >> > need
> >> > to use in Java API and C++ API?
> >> >
> >> > Any suggestion would be appreciated. Thanks.
> >> >
> >> > Best regards,
> >> > Wenjian
>