logo       

Example "local" fails on node with two IP addresses: msg#00065

file-systems.lustre.user

Subject: Example "local" fails on node with two IP addresses

Hi,

I'm encountering problem when starting the "local" example (one
MSD, LOV, OST, and client, all on node "sun-n1-console").

# lmc -m test.xml --batch test.txt
# cat test.txt
--add node --node sun-n1-console
--add net --node sun-n1-console --nettype lnet --nid sun-n1-console@tcp
--add mds --node sun-n1-console --mds mds1 --fstype ldiskfs --dev
/tmp/mds1-sun-n1-console --size 400000
--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1
--stripe_pattern 0
--add ost --node sun-n1-console --lov lov1 --ost ost1-sun-n1-console --fstype
ldiskfs --dev /tmp/ost1-sun-n1-console --size 400000
--add mtpt --node sun-n1-console --path /mnt/lustre --mds mds1 --lov lov1



The node has two ethernets, eth0 and eth1, both on separate subnets.
I deploys all lustre components on eth1 (IP: 192.168.123.45, hostname:
sun-n1-console).

# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
xxx.yyy.zzz.ab public-host
192.168.123.45 sun-n1-console


When eth0 is down, I successfully deployed the "local" example.
Only when eth0 is up that Lustre fails to start (see attachment)

The error messages from /var/log/messages indicates that MDS does
not respond (see below). I believe it's not caused by firewall cause
I've switched it off:

# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination




And here're are the error messages:

# tail /var/log/messages
Apr 20 17:37:35 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@f7fe7e00
x22/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2 fl
Rpc:/0/0 rc 0/0
Apr 20 17:37:35 sun-n1-console kernel: LustreError:
6840:0:(client.c:947:ptlrpc_expire_one_request()) @@@ timeout (sent at
1177061855, 0s ago) req@f7fe7e00 x22/t0
o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 1 fl
Rpc:/0/0 rc 0/0
Apr 20 17:37:35 sun-n1-console kernel: LustreError:
6840:0:(client.c:947:ptlrpc_expire_one_request()) Skipped 2 previous similar
messages
Apr 20 17:38:00 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@ed133e00
x23/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2 fl
Rpc:/0/0 rc 0/0
Apr 20 17:38:25 sun-n1-console kernel: audit(1177061905.683:64): avc: denied
{ rawip_recv } for pid=6537 comm="socknal_cd03" saddr=192.168.123.45 src=1023
daddr=192.168.123.45 dest=988 netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:25 sun-n1-console kernel: audit(1177061905.884:65): avc: denied
{ rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988
netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:26 sun-n1-console kernel: audit(1177061906.286:66): avc: denied
{ rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988
netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:27 sun-n1-console kernel: audit(1177061907.090:67): avc: denied
{ rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988
netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:28 sun-n1-console kernel: audit(1177061908.698:68): avc: denied
{ rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988
netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:30 sun-n1-console kernel: LustreError:
6539:0:(acceptor.c:442:lnet_acceptor()) Error -11 reading connection request
from 192.168.123.45
Apr 20 17:38:30 sun-n1-console kernel: audit(1177061910.683:69): avc: denied
{ rawip_send } for pid=6539 comm="acceptor_988" saddr=192.168.123.45 src=988
daddr=192.168.123.45 dest=1023 netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:30 sun-n1-console kernel: LustreError:
6537:0:(socklnd_cb.c:2160:ksocknal_recv_hello()) Error -104 reading HELLO from
192.168.123.45
Apr 20 17:38:30 sun-n1-console kernel: LustreError: Connection to
192.168.123.45@tcp at host 192.168.123.45 on port 988 was reset: is it running
a compatible version of Lustre and is 192.168.123.45@tcp one of its NIDs?
Apr 20 17:38:50 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@ec698e00
x25/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2 fl
Rpc:/0/0 rc 0/0
Apr 20 17:39:15 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@e97c8c00
x26/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2 fl
Rpc:/0/0 rc 0/0



Any advices how to make this simple example work?


Regards,
Verdi


--
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail

Attachment: attach.txt
Description: Text document

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-KYPl3Ael/zSakBO8gow8eQ@xxxxxxxxxxxxxxxx
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise