|
Example "local" fails on node with two IP addresses: msg#00065file-systems.lustre.user
Hi, I'm encountering problem when starting the "local" example (one MSD, LOV, OST, and client, all on node "sun-n1-console"). # lmc -m test.xml --batch test.txt # cat test.txt --add node --node sun-n1-console --add net --node sun-n1-console --nettype lnet --nid sun-n1-console@tcp --add mds --node sun-n1-console --mds mds1 --fstype ldiskfs --dev /tmp/mds1-sun-n1-console --size 400000 --add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1 --stripe_pattern 0 --add ost --node sun-n1-console --lov lov1 --ost ost1-sun-n1-console --fstype ldiskfs --dev /tmp/ost1-sun-n1-console --size 400000 --add mtpt --node sun-n1-console --path /mnt/lustre --mds mds1 --lov lov1 The node has two ethernets, eth0 and eth1, both on separate subnets. I deploys all lustre components on eth1 (IP: 192.168.123.45, hostname: sun-n1-console). # cat /etc/hosts 127.0.0.1 localhost.localdomain localhost xxx.yyy.zzz.ab public-host 192.168.123.45 sun-n1-console When eth0 is down, I successfully deployed the "local" example. Only when eth0 is up that Lustre fails to start (see attachment) The error messages from /var/log/messages indicates that MDS does not respond (see below). I believe it's not caused by firewall cause I've switched it off: # iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination And here're are the error messages: # tail /var/log/messages Apr 20 17:37:35 sun-n1-console kernel: LustreError: 6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@f7fe7e00 x22/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2 fl Rpc:/0/0 rc 0/0 Apr 20 17:37:35 sun-n1-console kernel: LustreError: 6840:0:(client.c:947:ptlrpc_expire_one_request()) @@@ timeout (sent at 1177061855, 0s ago) req@f7fe7e00 x22/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Apr 20 17:37:35 sun-n1-console kernel: LustreError: 6840:0:(client.c:947:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Apr 20 17:38:00 sun-n1-console kernel: LustreError: 6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@ed133e00 x23/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2 fl Rpc:/0/0 rc 0/0 Apr 20 17:38:25 sun-n1-console kernel: audit(1177061905.683:64): avc: denied { rawip_recv } for pid=6537 comm="socknal_cd03" saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988 netif=lo scontext=system_u:object_r:unlabeled_t tcontext=system_u:object_r:netif_lo_t tclass=netif Apr 20 17:38:25 sun-n1-console kernel: audit(1177061905.884:65): avc: denied { rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988 netif=lo scontext=system_u:object_r:unlabeled_t tcontext=system_u:object_r:netif_lo_t tclass=netif Apr 20 17:38:26 sun-n1-console kernel: audit(1177061906.286:66): avc: denied { rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988 netif=lo scontext=system_u:object_r:unlabeled_t tcontext=system_u:object_r:netif_lo_t tclass=netif Apr 20 17:38:27 sun-n1-console kernel: audit(1177061907.090:67): avc: denied { rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988 netif=lo scontext=system_u:object_r:unlabeled_t tcontext=system_u:object_r:netif_lo_t tclass=netif Apr 20 17:38:28 sun-n1-console kernel: audit(1177061908.698:68): avc: denied { rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988 netif=lo scontext=system_u:object_r:unlabeled_t tcontext=system_u:object_r:netif_lo_t tclass=netif Apr 20 17:38:30 sun-n1-console kernel: LustreError: 6539:0:(acceptor.c:442:lnet_acceptor()) Error -11 reading connection request from 192.168.123.45 Apr 20 17:38:30 sun-n1-console kernel: audit(1177061910.683:69): avc: denied { rawip_send } for pid=6539 comm="acceptor_988" saddr=192.168.123.45 src=988 daddr=192.168.123.45 dest=1023 netif=lo scontext=system_u:object_r:unlabeled_t tcontext=system_u:object_r:netif_lo_t tclass=netif Apr 20 17:38:30 sun-n1-console kernel: LustreError: 6537:0:(socklnd_cb.c:2160:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.123.45 Apr 20 17:38:30 sun-n1-console kernel: LustreError: Connection to 192.168.123.45@tcp at host 192.168.123.45 on port 988 was reset: is it running a compatible version of Lustre and is 192.168.123.45@tcp one of its NIDs? Apr 20 17:38:50 sun-n1-console kernel: LustreError: 6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@ec698e00 x25/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2 fl Rpc:/0/0 rc 0/0 Apr 20 17:39:15 sun-n1-console kernel: LustreError: 6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@e97c8c00 x26/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2 fl Rpc:/0/0 rc 0/0 Any advices how to make this simple example work? Regards, Verdi -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail
Lustre-discuss mailing list Lustre-discuss-KYPl3Ael/zSakBO8gow8eQ@xxxxxxxxxxxxxxxx https://mail.clusterfs.com/mailman/listinfo/lustre-discuss |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: How to calculate used rate of a OST ?: 00065, swin wang |
|---|---|
| Next by Date: | Re: Example "local" fails on node with two IP addresses: 00065, Alexey Lyashkov |
| Previous by Thread: | Re: Lustre and EMC PowerPath for Failover/Load Balancingi: 00065, Jeff Blasius |
| Next by Thread: | Re: Example "local" fails on node with two IP addresses: 00065, Alexey Lyashkov |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |