OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Merging RecordBatches [C++]


hi Rob,

This is going to be a bit wasteful because of the extra buffer
padding, etc., but in any case: the thing you are missing is a
function to concatenate arrays, which can be used to make a record
batch concatenation function. A relevant JIRA is
https://issues.apache.org/jira/browse/ARROW-549

- Wes
On Wed, Oct 31, 2018 at 1:50 PM Ambalu, Robert
<Robert.Ambalu@xxxxxxxxxxx> wrote:
>
> Hey, Im trying to figure out how to merge multiple recordbatches in order to optimize overly-chunked tables.
> A bit of background here... we have a process that is streaming table rows with a batch size of 1 ( because we want to ensure updates are written out in case of a crash ).  We also have some code that reads this table on startup.
> Our reading code has logic to access a specific row of a table, which this startup code does.  To access a specific row you need to iterate through all chunks to find the right one.  We're hitting a bottle neck on this specific file since it has a chunk size of 1.  Simplest solution for us would be to merge all the chunked data into one chunk on startup when we read in the arrow file.  We've tried to find a way to do this using the arrow c++ library / documents but cant seem to find a clean approach.
> Is there any clean way to do this?  Any other possible suggestions?
>
> Side note - we did notice theres some method called "RechunkArraysConsistently" .  We couldn't find much info on it, but if that somehow ensures all chunks are of the same size and we can re-chunk the columns, then row access would be a quick calc ( if all chunks are the same size computing chunk / row in chunk is quick )
>
>
> Thanks
> - Rob
>
>
>
>
>
> DISCLAIMER: This e-mail message and any attachments are intended solely for the use of the individual or entity to which it is addressed and may contain information that is confidential or legally privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and permanently delete this message and any attachments.
>
>
>