[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Developing a standard memory layout for in-memory records / "row-oriented" data

hi Antoine,

On Sun, Jun 24, 2018 at 1:06 PM, Antoine Pitrou <antoine@xxxxxxxxxx> wrote:
> Hi Wes,
> Le 24/06/2018 à 08:24, Wes McKinney a écrit :
>> If this sounds interesting to the community, I could help to kickstart
>> a design process which would likely take a significant amount of time.
>> The requirements could be complex (i.e. we might want to support
>> variable-size record fields while also providing random access
>> guarantees).
> What do you call "variable-sized" here? A scheme where the length of a
> record's field is determined by the value of another field in the same
> record?

As an example, here is a fixed size record

record foo {
  a: int32;
  b: float64;
  c: uint8;

With padding suppose this is 16 bytes per record; so if we have a
column of these, then random accessing any value in any record is

Here's a variable-length record:

record bar {
  a: string;
  b: list<int32>;

What I've seen done to represent this in memory is to have a fixed
size record followed by a sidecar containing the variable-length data,
so the fixed size portion might look something like

a_offset: int32;
a_length: int32;
b_offset: int32;
b_length: int32;

So from this, you can do random access into the record. If you wanted
to do random access on a _column_ of such records, it is similar to
our current variable-length Binary type. So it might be that the
underlying Arrow memory layout would be FixedSizeBinary for fixed-size
records and variable Binary for variable-size records.

- Wes

> Regards
> Antoine.