[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Issues with self executing jar and FileSystems API


I am trying to package a Beam Dataflow pipeline as a self executing jar using these instructions. However, I am running into a weird issue when attempting to execute this jar.

My pipeline needs to read a file (avro schema .avsc) from GCS outside of a PCollection before starting to work with PCollections. In order to do that I use the FileSystems API. This works perfectly fine when I execute the pipeline via mvn compile exec:java ..

However, if I attempt to run this as a jar, it appears to treat the GCS path as local and fails with a FileNotFoundException.

Exception in thread "main" java.io.FileNotFoundException: /some/local/filesystem/path/myproject/gs:/my-gcs-bucket/schema/my-schema.avsc (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:113)
at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:78)
at org.apache.beam.sdk.io.FileSystems.open(FileSystems.java:262)

(Note that the input path is correct with the double slash but the error seems to strip that out
e.g: --inputPath=gs://my-gcs-bucket/schema/my-schema.avsc)

Any pointers on what might be causing this?

- Sameer