[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: BigQuery streaming insert errors

Hi Gurav, many thanks for your response. I'm currently using retry policies, but imagine the following scenario:

I'm trying to insert an existing field, even if we retry, it will still fail but I'll never be able to detect that within the pipeline, as getFailedInserts() https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html#getFailedInserts-- only contains the TableRows that failed, not the reason.

Adding the error as well won't be very hard as I understand it because BigQueryServicesImpl.insertAll|() actually know about it: https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L750

I think I would even volunteer to work on it if the community feels it makes sense as well.


On Fri, Apr 6, 2018 at 1:28 AM Gaurav Thakur <gaurav2985@xxxxxxxxx> wrote:

On Fri, Apr 6, 2018 at 8:13 AM, Pablo Estrada <pabloem@xxxxxxxxxx> wrote:
Im adding Cham as he might be knowledgeable about BQ IO, or he might be able to redirect to someone else.
Cham, do you have guidance for Carlos here?

On Mon, Apr 2, 2018 at 11:08 AM Carlos Alonso <carlos@xxxxxxxxxxxxx> wrote:
And... where could I catch that exception?

On Mon, 2 Apr 2018 at 16:58, Ted Yu <yuzhihong@xxxxxxxxx> wrote:
Wouldn't the following code give you information about failed insertions (around line 790 in BigQueryServicesImpl) ?

      if (!allErrors.isEmpty()) {
        throw new IOException("Insert failed: " + allErrors);


On Mon, Apr 2, 2018 at 7:16 AM, Carlos Alonso <carlos@xxxxxxxxxxxxx> wrote:
Hi everyone!!

I was wondering if there's any way to get the error why an insert (streaming) failed. Looking at the code I think there's currently no way to do that, as the BigQueryServicesImpl insertAll seems to discard the errors and just add the failed TableRow instances into the failedInserts list.

It would be very nice to have an "enriched" TableRow returned instead that contains the error information for further processing (in our use case we're saving the failed ones into a different table for further analysis)

Could this be added as an enhancement or similar Issue in GH/Jira? Any other ideas?


Got feedback? go/pabloem-feedback