Merging of parquet file schemas
I am attempting to read a number of smaller parquet files and merge them into a larger parquet file.
The files are created by Spark jobs that run periodically throughout the day.
The issue I have is that the small parquet files can have slightly different schemas and when I create the Dataset it complains that the schemas aren’t the same.
Spark handles this by merging the schemas together, is there functionality in pyarrow that can do the same?