[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[jira] [Created] (ARROW-3957) pyarrow.hdfs.connect fails silently

Jim Fulton created ARROW-3957:

             Summary: pyarrow.hdfs.connect fails silently
                 Key: ARROW-3957
                 URL: https://issues.apache.org/jira/browse/ARROW-3957
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.11.1
         Environment: centos 7
            Reporter: Jim Fulton

I'm trying to connect to HDFS using libhdfs and Kerberos.

I have JAVA_HOME and HADOOP_HOME set and {{pyarrow.hdfs.connect}} sets CLASSPATH correctly.

My connect call looks like:

{{import pyarrow.hdfs c = pyarrow.hdfs.connect(host='MYHOST', port=42424, user='ME', kerb_ticket="/tmp/krb5cc_498970") }}

This doesn't error but the resulting connection can't do anything. They either error like this:

{{ArrowIOError: HDFS list directory failed, errno: 255 (Unknown error 255) }}

Or swallow errors (e.g. {{exists}} returning {{False}}).

Note that {{connect}} errors if the host is wrong but doesn't error if the port, user, or kerb_ticket are wrong. I have no idea how to debug this, because no useful errors.

Note that I _can_ connect using the hdfs Python package. (Of course, that doesn't provide the API I need to read Parquet files.).

Any help would be appreciated greatly.

This message was sent by Atlassian JIRA