logo       

Re: subsystem_debug and mem_used in proc: msg#00016

file-systems.lustre.user

Subject: Re: subsystem_debug and mem_used in proc

On Apr 04, 2007 11:57 -0800, Jan H. Julian wrote:
> These are client nodes and in fact, this class of node is running a
> particular application that intermittently fails leaving a lustre
> error in syslog.
>
> "mg38 kernel: LustreError:
> 10147:0:(lov_request.c:180:lov_update_enqueue_set()) error: enqueue
> objid 0x3922667 subobj 0x15dfc on OST idx 1: rc = -4"

This means that the enqueue was interrupted (-4 = -EINTR in
/usr/include/asm/errno.h). That shouldn't happen unless the job was
waiting a long time already (at least 100s, and then it was killed).

What does /proc/slabinfo show for lustre allocated memory in the slab
cache (most items are "ll_*")?

> While mg38 is currently showing a negative valued for memused, I have
> not been able to tie that to a failure. The error message points to
> the same file
>
> At 1:34 PM -0600 4/4/07, Andreas Dilger wrote:
> >On Apr 04, 2007 11:28 -0800, Jan H. Julian wrote:
> >> Could someone please clarify the use of the proc values for
> >> subsystem_debug and memused. In regard to
> >> /proc/sys/portals/subsystem_debug and /proc/sys/portals/debug, should
> >> both be set to zero to totally turn of debugging?
> >>
> >> In regard to /proc/sys/lustre/memused we see quite a variety of
> >> entries included many with negative values. Does the negative value
> >> have a particular meaning?
> >> For instance "cat /proc/sys/lustre/memused" for 9 nodes shows:
> >> ...
> >> mg07 102186899
> >> mg08 101775995
> >> mg09 -1323553489
> >> mg10 -1328553965
> >> mg11 -1378379739
> >> mg12 -1347059989
> >> mg13 -1364717487
> >> mg14 -1358477913
> >> mg15 24680370
> >>
> >> These are 16 core machines with 64GB of resident memory.
> >
> >This appears to be an overflow of a 32-bit counter. It isn't strictly
> >harmful, because it will underflow an equal amount later on and should
> >return to zero when Lustre unmounts. It does make this stat less useful
> >on machines with lots of RAM.
> >
> >Are these client or server nodes? I'm a bit surprised that Lustre would
> >be allocating > 2GB of RAM.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise