OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: google.cloud.bigQuery version on workers - please HELP


Hi Ahmet, thank you for the detailed explanation. Looking forward for the latest BQ - beam version upgrade. Best, Eila

On Fri, Jul 13, 2018 at 9:02 PM, Ahmet Altay <altay@xxxxxxxxxx> wrote:


On Thu, Jul 12, 2018 at 7:35 PM, OrielResearch Eila Arich-Landkof <eila@xxxxxxxxxxxxxxxxx> wrote:
Hi Ahmat,


I have received the version from the worker using the following commands:

from google.cloud import bigquery
logging.info('bigquery.__version__ is %s ',bigquery.__version__)

I tried few time to install the google-cloud-bigquery on the workers using setup.py with no much success:

from setuptools import setup, find_packages

setup(
  name='label-or',
  version='1.0.0',
  packages=find_packages(),
  keywords=[
  ],
  license="Apache Software License",
  install_requires=[
    'google-cloud-bigquery==0.28.0',
  ],
  package_data={
  },
  data_files=[],
)


on the job report UI, this message is being reported ( I dont know if it is relevant to the dependencies)
SDK version
Google Cloud Dataflow SDK for Python 2.0.0

Yes, there is some related to the SDK version you are using. Dataflow worker containers will have different dependencies for each new SDK version. 2.0.0 is an old version, that explain why you were seeing the 0.23.0 as the installed version.
 


I was able to upgrade to bigquery.__version__ is 0.25.0 but not to 0.28.0 (which has different API) could you please advice what am I missing? Is it impossible to work with newer version?

Beam support BigQuery up to 0.25.0 version. There was a recent attempt to upgrade it and it uncovered issues due to the API differences. (Details: https://github.com/apache/beam/pull/5895). There is a recent push for Beam to upgrade all dependencies to their latest version, and I I assume this will be addressed as part of it.

Unfortunately, before that fix it is not possible to use the latest version of the bigquery.
 

Many thanks,
Eila


On Thu, Jul 12, 2018 at 9:40 PM, Ahmet Altay <altay@xxxxxxxxxx> wrote:
Hi Eila,

You can find a list of dependencies installed in Dataflow workers in [1]. Dataflow workers will have a set of dependencies that will satisfy the requirements from setup.py. 

Which bigquery library you are using? There is a google-cloud-bigquery==0.25.0 dependency, I am not sure where the 0.23.0 is coming from.

Workers do not pick up libraries from the client environment as part of the job submission. I am not sure how datalab UI integration works however you have a few options for installing any set of dependencies in the workers. Using requirements.txt is one of those options.

Ahmet


On Thu, Jul 12, 2018 at 8:51 AM, OrielResearch Eila Arich-Landkof <eila@xxxxxxxxxxxxxxxxx> wrote:
Hi all,

I am running python pipeline with google.cloud.bigquery library.
on the local runner, everything runs great
bigquery.__version__ is 0.28.0

on the dataflow runner, the version is 0.23.0 bigquery.__version__ is 0.23.0
and there are many API changes between these versions.

What will be the best way to change the installed version on the workers? I was assuming the the worker has all the master machine libraries installed when the execution is done from datalab - is that true?
I am not generating any requirements.txt, the execution is done through the run button on the datalab UI.




--




--