[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: google.cloud.bigQuery version on workers - please HELP

Hi Ahmat,

I have received the version from the worker using the following commands:

from google.cloud import bigquery
logging.info('bigquery.__version__ is %s ',bigquery.__version__)

I tried few time to install the google-cloud-bigquery on the workers using setup.py with no much success:

from setuptools import setup, find_packages

  license="Apache Software License",

on the job report UI, this message is being reported ( I dont know if it is relevant to the dependencies)
SDK version
Google Cloud Dataflow SDK for Python 2.0.0

I was able to upgrade to bigquery.__version__ is 0.25.0 but not to 0.28.0 (which has different API) could you please advice what am I missing? Is it impossible to work with newer version?

Many thanks,

On Thu, Jul 12, 2018 at 9:40 PM, Ahmet Altay <altay@xxxxxxxxxx> wrote:
Hi Eila,

You can find a list of dependencies installed in Dataflow workers in [1]. Dataflow workers will have a set of dependencies that will satisfy the requirements from setup.py. 

Which bigquery library you are using? There is a google-cloud-bigquery==0.25.0 dependency, I am not sure where the 0.23.0 is coming from.

Workers do not pick up libraries from the client environment as part of the job submission. I am not sure how datalab UI integration works however you have a few options for installing any set of dependencies in the workers. Using requirements.txt is one of those options.


On Thu, Jul 12, 2018 at 8:51 AM, OrielResearch Eila Arich-Landkof <eila@xxxxxxxxxxxxxxxxx> wrote:
Hi all,

I am running python pipeline with google.cloud.bigquery library.
on the local runner, everything runs great
bigquery.__version__ is 0.28.0

on the dataflow runner, the version is 0.23.0 bigquery.__version__ is 0.23.0
and there are many API changes between these versions.

What will be the best way to change the installed version on the workers? I was assuming the the worker has all the master machine libraries installed when the execution is done from datalab - is that true?
I am not generating any requirements.txt, the execution is done through the run button on the datalab UI.