I am trying to package a Beam Dataflow pipeline as a self executing jar using these
instructions. However, I am running into a weird issue when attempting to execute this jar.
My pipeline needs to read a file (avro schema .avsc) from GCS outside of a PCollection before starting to work with PCollections. In order to do that I use the FileSystems API. This works perfectly fine when I execute the pipeline via mvn compile exec:java ..
However, if I attempt to run this as a jar, it appears to treat the GCS path as local and fails with a FileNotFoundException.
Exception in thread "main" java.io.FileNotFoundException: /some/local/filesystem/path/myproject/gs:/my-gcs-bucket/schema/my-schema.avsc (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
(Note that the input path is correct with the double slash but the error seems to strip that out
Any pointers on what might be causing this?