[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Issues with self executing jar and FileSystems API

The FileSystems API uses a ServiceLoader[1] to find Apache Beam FileSystem implementations. The ServiceLoader works by finding "service" files on the classpath containing a list of classes implementing the Apache Beam FileSystem API. The way in which your creating an executable jar is likely dropping or incorrectly merging service files. The most common case is that your using the Maven shade plugin and you haven't configured it to use the services file resource transformer[2]. If you are packaging your executable jar a different way, you'll want to lookup the documentation for your tool and see how it can properly deal with the service files.

2: https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer

On Thu, Jun 21, 2018 at 12:06 PM Sameer Abhyankar <saabhyankar@xxxxxxxxxx> wrote:

I am trying to package a Beam Dataflow pipeline as a self executing jar using these instructions. However, I am running into a weird issue when attempting to execute this jar.

My pipeline needs to read a file (avro schema .avsc) from GCS outside of a PCollection before starting to work with PCollections. In order to do that I use the FileSystems API. This works perfectly fine when I execute the pipeline via mvn compile exec:java ..

However, if I attempt to run this as a jar, it appears to treat the GCS path as local and fails with a FileNotFoundException.

Exception in thread "main" java.io.FileNotFoundException: /some/local/filesystem/path/myproject/gs:/my-gcs-bucket/schema/my-schema.avsc (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:113)
at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:78)
at org.apache.beam.sdk.io.FileSystems.open(FileSystems.java:262)

(Note that the input path is correct with the double slash but the error seems to strip that out
e.g: --inputPath=gs://my-gcs-bucket/schema/my-schema.avsc)

Any pointers on what might be causing this?

- Sameer