logo       

Re: Linux NFS and NetApp filers: msg#00319

Subject: Re: Linux NFS and NetApp filers
Hi Steve,

A quick reply for the moment.  We use what sounds like a similar environment to 
you.  We have a number of Dell servers (PE1550,1650,2450,2550, and 2650's) 
running Linux.  And we also own a NetApp F820 filer, and are planning the 
purchase of an F825 to keep it company.  We mount a lot of stuff over NFS from 
the Linux server to the filer.  This includes home grown code, using things 
like mod_perl.  And it also includes some pretty heavy database type stuff - 
some PostgreSQL databases and a big Notes Domino db, and Verity K2 collections. 
 Plus we plan to put mount some Oracle databases this way within a month or so.

Generally this works very well for us.  We have a mix of kernels talking to the 
filer, and no problems like you are describing.  We only run stock RH kernels 
(including AS kernels).

My biggest tip is to check out the following, if you haven't already.  It's 
pretty thorough and has a lot of good info in it.  Written by a NetApp 
engineer: Using the Linux® NFS Client with Network Appliance Filers
http://www.netapp.com/tech_library/3183.html

Some of what we've found:
- Our default standard mount options are:  
proto=udp,hard,bg,intr,wsize=8192,rsize=8192
- For Oracle we'll try the recommended tcp protocol, and probably test 16k 
windows
- Early kernels seem to work best with network stack socket buffer tuning in 
/etc/sysctl.conf
- We've tried to ensure heavy traffic servers run newer kernels - 2.4.18-17 or 
later
- We've got the filer (twice) and major servers on Gb ethernet, which improved 
things a lot over 100Mb

I think that the IP fragmentation issue can be partly resolved via newer 
kernels and/or bigger socket buffers via sysctl.conf.  At one point, as well as 
the NetApp suggestions I played with:
  net.ipv4.tcp_rmem = 8192 262143 8388608
  net.ipv4.tcp_wmem = 4096 262143 8388608
Though later testing didn't seem to require this.  Use at your own risk.  May 
be worth checking nfsstat -r to see if you are getting a lot of rpc retransmits.

Hope this helps!

Yours,

Ed


>I've actually been dealing with a couple different NFS-related issues,
>on desktops and servers, on RedHat 7.2, 7.3, and 8.0.  We have a fleet
>(~ 22) of PE2650s in the back room (for batch job processing), and
>several Precision 530ns for desktops.  I would imagine other people may
>not see the same severity of problems I'm seeing because our environment
>is heavily NFS-driven, i.e.  I run Oracle on Linux over NFS, /usr/local,
>our project directories, and just about everything is NFS mounted.  We
>use numerous Network Appliance NFS filers to serve data.
>
>On heavily loaded (NFS traffic) RH 7.3 systems (2.4.18-4), the NFS
>performance is spotty and erratic.  I see tons of "kernel: nfs: server
>xxx not responding, still trying" errors in /var/log/messages, followed
>by variable amounts of time (usu. 1-60secs), then "server xxx OK".  At
>its worst my Oracle server stalled for 1.5 hours in the middle of a
>long-running report.  While this is slowing things down, at least
>nothing is dying because of it :-), as NFS does pick up eventually and
>things continue.
>
>NetApp has a bug on file similar to this issue, claiming the bug is
>actually in the Linux IP fragmentation code, and that switching to tcp
>mounts will help.  And so at first using tcp mounts seemed to help,
>because the barrage of "server not reponding" messages went away, but
>sadly they were replaced by random hangings, accompanied by "kernel:
>lockd: server xx.xx.xx.xx not responding, still trying" messages. 
>Great--now it's lockd.  Argh :-).  
>
>I've wanted to try later kernels, but given the widespread reports of
>problems with the tg3 driver, I felt I'd be trading one set of problems
>for another :-).  Also, I'm speculating that since RedHat is just
>patching the same old 2.4.18 kernel, there probably aren't really any
>bug fixes for the NFS code between, say 2.4.18-4 and 2.4.18-26 (maybe
>I'm mistaken here, please let me know if so).  What I need is fixes to
>the NFS client-side code which I'm thinking will only come with an
>upgrade to a later kernel version (e.g. 2.4.2x).
>
>On the desktop side, our Precisions came with RedHat 8.0 (2.4.18-14),
>and we would experience random hangings periodically throughout the day
>while accessing files over NFS.  I tried upgrading to a stock 2.4.20
>kernel, but then reading files was preceded by a 1-2 second pause (I've
>seen other reports of this with 2.4.20).  I then installed RH's
>2.4.18-24 and my pauses went away, but other users are still
>complaining.  I tried converting those users to tcp, but then the NFS
>performance dropped through the floor, so I had to switch them back to
>udp :-).
>
>I've also tried a bunch of other things too (e.g. changing NFS block
>sizes), but it's hard to remember everything.  If it weren't my job, at
>this point I'd just say "oh well" and wait 6-12 months for Linux's NFS
>to get better :-).  But I've gotten enough positive responses to my
>query about tg3 in 2.4.18-26, so I'll try upgrading one of my 2650s to
>it and report back here.  Thanks again everyone.



---
Ed Martin
Head of Systems and Network Performance
IOP Publishing Ltd
Dirac House, Temple Back
Bristol  BS1 6BE
ddi: +44 (0)117 930 1102
www:  http://www.iop.org


**********************************************************************
Institute of Physics
Registered charity No. 293851
76 Portland Place, London, W1B 1NT, England

IOP Publishing Limited
Registered in England under Registration No 467514.
Registered Office: Dirac House, Temple Back, Bristol BS1 6BE England

This e-mail message has been checked by MIMEsweeper using
F-Secure Anti-Virus for the presence of computer viruses.
**********************************************************************

_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge@xxxxxxxx
http://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq or search the list archives 
at http://lists.us.dell.com/htdig/



<Prev in Thread] Current Thread [Next in Thread>