osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Read all json files from a hdfs partition folder


Hi,

If the question is how to read all files from hdfs directory,
in general, each file is potentially a different DataSet (not DataStream). 
It needs to be decided how to combine/join them in Flink pipeline.

If the files are small enough, you could list them as string paths and use env.fromCollection to start the pipeline.
Next just manually load file into memory for each path in map operation and transform file contents into records for the next stage.

Best,
Andrey

On 12 Dec 2018, at 15:02, Rakesh Kumar <rakkukumar2707@xxxxxxxxx> wrote:

Hi,

I wanted to read all json files from hdfs with partition folder.

public static void main(String[] args) {

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

//path
//hdfs://localhost:8020/data/ingestion/ingestion.raw.product/2018/12/05/23

DataStream<String> df = env.readTextFile("hdfs://localhost:8020/data/ingestion/ingestion.raw.product");
try {
df.print();
env.execute("dfg");
} catch (Exception e) {
e.printStackTrace();
}

}