I am running a pipeline, where a table from BQ is being processed line by line using ParDo function.
ExploreData = (p | "Extract the rows from dataframe" >> beam.io.Read(beam.io.BigQuerySource('archs4.Debug_annotation'))
| "create more columns" >> beam.ParDo(CreateColForSampleFn(colListSubset,outputPath)))
(ExploreData | 'writing to CSV files' >> beam.io.WriteToText('gs://archs4/output/dataExploration.txt',num_shards=1))
My questions are related to the returned DF and WriteToText:
1. when I pass DF from the CreateColForSampleFn to WriteToText , I get only the headers:
2. When I return the df in a list [df], I get the following txt for each row (including the dimensions)
Sample_contact_phone Sample_extract_protocol_ch1 Sample_platform_id Sick
0 Library construction protocol: Four Âµg of tota... GPL11154 None
[1 rows x 168 columns]
I want to generate a text file that includes:
- One header (if needed, I will add it after the pipeline completed)
- All the values from each rows that was processed and generated DF
- Full cell values, without ... in the middle
What am I missing? any advice?