logo       

Ubuntu 6.06 LTS NFS + NetApp filer problem: NFS locks up: msg#00069

Subject: Ubuntu 6.06 LTS NFS + NetApp filer problem: NFS locks up
Hi,

I have a cluster of Ubuntu Dapper 6.06 LTS servers acting as  
toasters, serving POP3, IMAP, IMAP proxy, webmail for a large mail  
cluster.

During the last couple of weeks, I've had two serves in the cluster  
unexpectedly stop doing any NFS operations to one of our NetApp  
filers.  Each server mounts three separate NetApp filers, and we've  
got over 30 servers in this cluster mounting the same filers, and  
running with exactly the same configuration (we kickstart all our  
servers, so they're exactly the same, some just doing other tasks).

I see these errors in my syslog, but I suspect (based on graphs for  
at what time I/O wait went up, and at what time these entries appears  
in the logs) that they only appear here after the problem has already  
started (because it can't do a stat on the NFS fs), so I doubt  
they're of any use in solving my problem.

Oct 11 12:33:34 toaster01-mail kernel: [863316.270699] nfs_statfs:  
statfs error = 512
Oct 11 12:33:37 toaster01-mail kernel: [863319.113889] nfs_statfs:  
statfs error = 512
Oct 11 12:34:32 toaster01-mail kernel: [863374.713564] nfs_statfs:  
statfs error = 512
Oct 11 12:34:35 toaster01-mail kernel: [863377.134229] nfs_statfs:  
statfs error = 512
Oct 11 12:35:36 toaster01-mail kernel: [863438.108434] nfs_statfs:  
statfs error = 512
Oct 11 12:35:45 toaster01-mail kernel: [863447.372073] nfs_statfs:  
statfs error = 512
Oct 11 12:46:44 toaster01-mail kernel: [864105.539485] nfs_statfs:  
statfs error = 512

I've checked each network interface from eth1 (backend storage  
interface) through to the fibre gig port that connects to the NetApp,  
and there's no errors, collisions, etc.  I've also read the 32Mb  
Netapp filer report, and can't spot anything unusual there.  Also  
tried doing a strace for say listing files on /mailspool/mail8, but  
just get ... "lstat("/mailspool/mail8",  <unfinished ...>".

I've got one server currently out of service, that's experiencing  
this "condition", so I can get any stats/output from this server if  
there's anything anyone can think of?

Here's some details:-

Kernel:

2.6.15-27-amd64-k8
linux-image-2.6.15-27-amd64-server           
2.6.15-27.50                 Linux kernel image for version 2.6.15 on  
Ser

NFS client:

nfs-common                                   
1.0.7-3ubuntu2               NFS support files common to client and  
serve

Filer mount options:

10.1.25.212:/vol/vol0/mail8-export      /mailspool/mail8        nfs  
rw,hard,intr,timeo=600,retrans=2,rsize=32768,wsize=32768 0 0

eth1 interface output:

eth1: negotiated 100baseTx-FD, link ok

eth1      Link encap:Ethernet  HWaddr 00:17:08:50:78:41
           inet addr:10.1.25.141  Bcast:10.1.25.255  Mask:255.255.255.0
           inet6 addr: fe80::217:8ff:fe50:7841/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:1468254714 errors:0 dropped:0 overruns:0 frame:0
           TX packets:995375684 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:1677649220750 (1.5 TiB)  TX bytes:155688040628  
(144.9 GiB)
           Interrupt:193

nfsstat output:

Client rpc stats:
calls      retrans    authrefrsh
460431933   104        0
Client nfs v2:
null       getattr    setattr    root       lookup     readlink
0       0% 0       0% 0       0% 0       0% 0       0% 0       0%
read       wrcache    write      create     remove     rename
0       0% 0       0% 0       0% 0       0% 0       0% 0       0%
link       symlink    mkdir      rmdir      readdir    fsstat
0       0% 0       0% 0       0% 0       0% 0       0% 0       0%

Client nfs v3:
null       getattr    setattr    lookup     access     readlink
0       0% 114645897 24% 2066807  0% 78704007 17% 151704157 32%  
0       0%
read       write      create     mkdir      symlink    mknod
53366969 11% 11421861  2% 6827536  1% 8280    0% 0       0% 0       0%
remove     rmdir      rename     link       readdir    readdirplus
10806338  2% 233     0% 2897780  0% 3458142  0% 5434425  1% 19085492  4%
fsstat     fsinfo     pathconf   commit
68      0% 4       0% 0       0% 3933    0%

Any ideas, things to try?

Cheers,
Jaco

-- 
bje@xxxxxxxxxxxxxxxxxx
the faculty of making fortunate discoveries



<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
science.linguis...    culture.sf.lite...    video.mplayer.c...    yellowdog.gener...    ietf.rfc822/199...    emacs.help/2002...    redhat.release....    kernel.speakup/...    java.openejb.de...    debian.devel.gt...    xfree86.newbie/...    bug-tracking.ma...    pam/2003-05/msg...    games.devel.ope...    user-groups.lin...    music.pancham/2...    network.mq.deve...    web.html.genera...    arklinux.bugs/2...    linux.ecasound/...    qnx.openqnx.dev...    org.user-groups...    file-systems.sf...    trustix.contrib...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe