osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Go SDK: Bigquery and nullable field types.


The Go SDK can't actually serialize named types -- we serialize the structural information and recreate assignment-compatible isomorphic unnamed types at runtime for convenience. This usually works fine, but perhaps not if inspected reflectively. Have you tried to Register the Record (or bigquery.NullString) type? That bypasses the serialization.

Thanks,
 Henning

On Thu, Jun 21, 2018 at 5:40 PM eduardo.morales@xxxxxxxxx <eduardo.morales@xxxxxxxxx> wrote:
I am using the bigqueryio transform and I am using the following struct to collect a data row:

type Record {
  source_service  biquery.NullString
  .. etc...
}

This works fine with the direct runner, but when I try it with the dataflow runner, then I get the following exception:

java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction -41: execute failed: bigquery: schema field source_service of type STRING is not assignable to struct field source_service of type struct { StringVal string; Valid bool }
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
        at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:55)
        at com.google.cloud.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:274)
        at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:83)
        at com.google.cloud.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:101)
        at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:391)
        at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:360)
        at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:288)
        at com.google.cloud.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:179)
        at com.google.cloud.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:107)
        Suppressed: java.lang.IllegalStateException: Already closed.
                at org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:97)
                at com.google.cloud.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:93)
                at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:89)
                ... 6 more

Looks like the bigquery API is failing to detect the nullable type NullString, and instead is attempting a plain assignment. Could it be that some aspect of the type information has been lost thus preventing the bigquery API from identifying and handling NullString properly?