logo       

Re: Multiple clients creating/deleting files in the same directory: msg#00055

file-systems.lustre.user

Subject: Re: Multiple clients creating/deleting files in the same directory

On Apr 18, 2007, at 10:40 AM, Scott Atchley wrote:

Hi all,

We are testing a small cluster with 1 MDS, 2 OSS, and 5 clients. When all clients are writing to independent directories as is well. When one client tries to list the contents of a directory that another client is creating/deleting files in, Lustre will hang and / var/log/messages shows a lot of "printk suppressed" messages.

Is this normal behavior or can we do something to minimize it (besides not having two clients work in the same directory)?

Scott

This may or may not be related, but four of the clients can list a directory, but the fifth client cannot. On the fifth client, dmesg shows:

Lustre: MDC_nas-0-0.local_mds1_MNT_client-0000010037e37800: Connection restored to service mds1 using nid 192.168.1.250@tcp.
Lustre: Skipped 1 previous similar message
LustreError: 9634:0:(mdc_request.c:684:mdc_close()) Unexpected: can't find mdc_open_data, but the close succeeded. Please tell CFS.
LustreError: 23673:0:(client.c:576:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -107 req@0000010006925800 x201696/t0 o400- >mds1_UUID@nas-0-0-m_UUID:12 lens 64/64 ref 1 fl Rpc:RN/0/0 rc 0/-107
LustreError: MDC_nas-0-0.local_mds1_MNT_client-0000010037e37800: Connection to service mds1 via nid 192.168.1.250@tcp was lost; in progress operations using this service will wait for recovery to complete.
LustreError: This client was evicted by mds1; in progress operations using this service will fail.
LustreError: 9645:0:(client.c:548:ptlrpc_check_reply()) @@@ ABORTED: req@00000100cfee7e00 x201693/t0 o37->mds1_UUID@nas-0-0-m_UUID:12 lens 240/240 ref 1 fl Rpc:E/0/0 rc 0/0
LustreError: 9645:0:(dir.c:329:ll_readdir()) error reading dir 480862/408283751 page 0: rc -5
LustreError: 9645:0:(dir.c:329:ll_readdir()) Skipped 89 previous similar messages
LustreError: 9645:0:(client.c:511:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@00000100cfee7e00 x201700/t0 o37->mds1_UUID@nas-0-0- m_UUID:12 lens 240/240 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 9645:0:(client.c:511:ptlrpc_import_delay_req()) Skipped 88 previous similar messages
Lustre: MDC_nas-0-0.local_mds1_MNT_client-0000010037e37800: Connection restored to service mds1 using nid 192.168.1.250@tcp.
LustreError: 9645:0:(mdc_request.c:684:mdc_close()) Unexpected: can't find mdc_open_data, but the close succeeded. Please tell CFS.

I like the "Please tell CFS." note. :-)

Any suggestions?

Scott


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise