[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Read all json files from a hdfs partition folder

Actually, does it not work if you just provide directory in env.readTextFile as in your code example or what is the problem?

On 12 Dec 2018, at 17:24, Andrey Zagrebin <andrey@xxxxxxxxxxxxxxxxx> wrote:


If the question is how to read all files from hdfs directory,
in general, each file is potentially a different DataSet (not DataStream). 
It needs to be decided how to combine/join them in Flink pipeline.

If the files are small enough, you could list them as string paths and use env.fromCollection to start the pipeline.
Next just manually load file into memory for each path in map operation and transform file contents into records for the next stage.


On 12 Dec 2018, at 15:02, Rakesh Kumar <rakkukumar2707@xxxxxxxxx> wrote:


I wanted to read all json files from hdfs with partition folder.

public static void main(String[] args) {

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();


DataStream<String> df = env.readTextFile("hdfs://localhost:8020/data/ingestion/ingestion.raw.product");
try {
} catch (Exception e) {