OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Merging of parquet file schemas


Hi,

I am attempting to read a number of smaller parquet files and merge them into a larger parquet file.

The files are created by Spark jobs that run periodically throughout the day.

The issue I have is that the small parquet files can have slightly different schemas and when I create the Dataset it complains that the schemas aren’t the same. 

Spark handles this by merging the schemas together, is there functionality in pyarrow that can do the same?

Thanks,
Dan