[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Some initial GPU questions

Hello All,

As some of you may know, a few of us at Quansight have started (in
parntership with NVIDIA) have started looking at Arrow's GPU capabilites.
We are excited to help improve and expand Arrow's GPU support, but we did
have a few initial scoping questions.

Feel free to break these out into separate discussion threads if needed.
Hopefully, some of them will be easy enough to answer.

   1. What is the status of the GPU code in arrow now? E.g.
   https://github.com/apache/arrow/tree/master/cpp/src/arrow/gpu Is anyone
   actively working on this part of the code base? Are there other folks
   working on GPU support? I'd love to chat, if so!
   2. Should arrow compute assume that everything fits in memory? Arrow
   seem to handle data that is larger than memory via the Buffer API. Are
   there restrictions that using Buffers imply that we should be aware of?
   3. What is the imagined interface be the pyarrow and a GPU DataFrame?
   One idea is to have the selection of main memory and the GPU to be totally
   transparent to the user. Another possible suggestion is to be explicit to
   the user about where the data lives, for example:

   >>> import pyarrow as pa
   >>> a = pa.array(..., type=...) # create pyarrow array instance
   >>> a_g = a.to_gpu(<device parameters>) # send `a` to GPU
   >>> def foo(a): ... return ... # a function doing operations with `a`
   >>> r = foo(a) # perform operations with `a`, runs on CPU
   >>> r_g = foo(a_g) # perform operations with `a_g`, runs on GPU
   >>> assert r == r_g.to_mem() # results are the same
   4. Who has been working on arrow compute kernels, are there any design
   docs or discussions we should look at? We've been following the Gandiva
   discussions and also the Ursa Labs Roadmap
   5. Should the user be able be able to switch between compute
   implementations at runtime, or only at compile time?
   6. Arrow's CI doesn't currently seem to support GPUs. If a free GPU CI
   service were to come along, would Arrow be open to using it?

Other than that we'd love to know where and how we can plug in and help out!

Be Well

Asst. Prof. Anthony Scopatz
Nuclear Engineering Program
Mechanical Engineering Dept.
University of South Carolina
Cell: (512) 827-8239
Book a meeting with me at https://scopatz.youcanbook.me/
Open up an issue: https://github.com/scopatz/me/issues
Check my calendar