osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bigquery streaming TableRow size limit


I'm a bit worried about making this automatic, as it can have unexpected side effects on BigQuery load-job quota. This is a 24-hour quota, so if it's accidentally exceeded all load jobs for the project may be blocked for the next 24 hours. However if the user opts in (possibly via .a builder method), this seems like it could be automatic.

Reuven

On Tue, Nov 13, 2018 at 7:06 AM Lukasz Cwik <lcwik@xxxxxxxxxx> wrote:
Having data ingestion work without needing to worry about how big the blobs are would be nice if it was automatic for users.

On Mon, Nov 12, 2018 at 1:03 AM Wout Scheepers <Wout.Scheepers@xxxxxxxxxxxxxxxxxxx> wrote:

Hey all,

 

The TableRow size limit is 1mb when streaming into bigquery.

To prevent data loss, I’m going to implement a TableRow size check and add a fan out to do a bigquery load job in case the size is above the limit.

Of course this load job would be windowed.

 

I know it doesn’t make sense to stream data bigger than 1mb, but as we’re using pub sub and want to make sure no data loss happens whatsoever, I’ll need to implement it.

 

Is this functionality any of you would like to see in BigqueryIO itself?

Or do you think my use case is too specific and implementing my solution around BigqueryIO will suffice.

 

Thanks for your thoughts,

Wout