Can read from Parquet file on any storage system supported by Beam
Can write to Parquet file on any storage system supported by Beam
Can configure the compression algorithm of output files
Can adjust the size of the row group
Can read multiple row groups in a single file parallelly (source splitting)
Can partially read by columns
Thanks Heejong. Added some comments. +1 for summarizing the doc in the email thread.- ChamOn Wed, Oct 24, 2018 at 4:45 PM Ahmet Altay <altay@xxxxxxxxxx> wrote:Thank you Heejong. Could you also share a summary of the design document (major points/decisions) in the mailing list?On Wed, Oct 24, 2018 at 4:08 PM, Heejong Lee <heejong@xxxxxxxxxx> wrote:Hi,I'm working on BEAM-4444: Parquet IO for Python SDK.Any feedback is appreciated. Thanks!