[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Merging of parquet file schemas


I am attempting to read a number of smaller parquet files and merge them into a larger parquet file.

The files are created by Spark jobs that run periodically throughout the day.

The issue I have is that the small parquet files can have slightly different schemas and when I create the Dataset it complains that the schemas aren’t the same. 

Spark handles this by merging the schemas together, is there functionality in pyarrow that can do the same?