I am using the python code to run my pipeline. similar to the following:
options = PipelineOptions()
google_cloud_options = options.view_as(GoogleCloudOptions)
google_cloud_options.project = 'my-project-id'
google_cloud_options.job_name = 'myjob'
google_cloud_options.staging_location = 'gs://your-bucket-name-here/staging'
google_cloud_options.temp_location = 'gs://your-bucket-name-here/temp'
options.view_as(StandardOptions).runner = 'DataflowRunner'
I would like to add pandas-gbq package installation to my workers. What would be the recommendation to do so. Can I add it to the PipelineOptions()?
I remember that there are few options, one of them was with creating a requirements text file but I can not remember where I saw it and if it is the simplest way when running the pipeline from datalab.
Thanks you for any reference!