Some initial GPU questions
As some of you may know, a few of us at Quansight have started (in
parntership with NVIDIA) have started looking at Arrow's GPU capabilites.
We are excited to help improve and expand Arrow's GPU support, but we did
have a few initial scoping questions.
Feel free to break these out into separate discussion threads if needed.
Hopefully, some of them will be easy enough to answer.
1. What is the status of the GPU code in arrow now? E.g.
https://github.com/apache/arrow/tree/master/cpp/src/arrow/gpu Is anyone
actively working on this part of the code base? Are there other folks
working on GPU support? I'd love to chat, if so!
2. Should arrow compute assume that everything fits in memory? Arrow
seem to handle data that is larger than memory via the Buffer API. Are
there restrictions that using Buffers imply that we should be aware of?
3. What is the imagined interface be the pyarrow and a GPU DataFrame?
One idea is to have the selection of main memory and the GPU to be totally
transparent to the user. Another possible suggestion is to be explicit to
the user about where the data lives, for example:
>>> import pyarrow as pa
>>> a = pa.array(..., type=...) # create pyarrow array instance
>>> a_g = a.to_gpu(<device parameters>) # send `a` to GPU
>>> def foo(a): ... return ... # a function doing operations with `a`
>>> r = foo(a) # perform operations with `a`, runs on CPU
>>> r_g = foo(a_g) # perform operations with `a_g`, runs on GPU
>>> assert r == r_g.to_mem() # results are the same
4. Who has been working on arrow compute kernels, are there any design
docs or discussions we should look at? We've been following the Gandiva
discussions and also the Ursa Labs Roadmap
5. Should the user be able be able to switch between compute
implementations at runtime, or only at compile time?
6. Arrow's CI doesn't currently seem to support GPUs. If a free GPU CI
service were to come along, would Arrow be open to using it?
Other than that we'd love to know where and how we can plug in and help out!
Asst. Prof. Anthony Scopatz
Nuclear Engineering Program
Mechanical Engineering Dept.
University of South Carolina
Cell: (512) 827-8239
Book a meeting with me at https://scopatz.youcanbook.me/
Open up an issue: https://github.com/scopatz/me/issues
Check my calendar