[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Proposed Arrow Graph representations

At GTC San Jose last month, NVidia's Joe Eaton (cc'd) presented on the
nvGraph <https://developer.nvidia.com/nvgraph> team's goals for
accelerating in-memory graph processing and analytics. A major component of
that is advancing and standardizing a common, efficient representation for
graphs that can support a broad ranges of use-cases, from small to large.

To that end, I'd like to kick off the discussion about native graph
representations in Arrow.

Joe's team has prepared a preliminary FlatBuffers schema for efficient
columnar representations of the four most common graph formats. It includes
embedded edge and vertex property tables, and is designed to be compatible
with the existing Arrow column types. My initial thoughts are that we could
add an optional 5th Graph Message type, similar to how Tensor Messages are
presently implemented.

I've pushed Joe's initial GraphSchema.fbs to this branch on my Arrow fork
>From what I understand, the tables have been expanded into separate
definitions for the sake of comprehension, and the final forms will be
collapsed into each distinct Graph type, parameterized by sizes defined at
the top.

I also understand the nvGraph team supports these layouts natively,
enabling the community to take advantage of high-performance GPU kernels
very early on, and possibly align with libraries like Hornet
<https://github.com/hornet-gt/hornetsnest> (previously cuStinger).