logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

Re: Job exceeding memory limits: msg#00072

Subject: Re: Job exceeding memory limits


Steve Young wrote:
        Thanks Dave. This is why I am wondering how torque checks an OS to
verify how much memory is being used. I suspect that when the job is
first being started that a lot more resources are used but after it's
underway it evens out to expected operation. I am hoping once I can find
out how torque does it that perhaps I can do the same from command line
to try to find out for myself why torque thinks that it needs so much
more memory.

Basically just add up what you find belonging to the job from ps aux.

        You bring up an interesting point... having MOM ignore resource usage
for young processes. I didn't see anything on parameters page for MOM to
configure this. Would you mind elaborating on how you did that? =).
Thanks in advance,


Fairly simplistically.  These are cutdown versions of the routines in
mom_mach.c for finding job vmem and mem respectively (I've chopped out
gory shared memory details but left gratuitous macros in).

David


static memsize_t mem_sum(job *pjob)
{
        char       *id="mem_sum";
        memsize_t  memsize=0;
        int        iproc;

        for (iproc=0; iproc<nproc; iproc++)  {
                psinfo_t *pi = &proc_info[iproc];

                if (!injob(pjob, pi->pr_sid))  continue;

                /*
                 * A feeble attempt to ignore the memory use of recently forked
                 * processes - ignore processes less than 2 seconds old
                 */
                if ( time_now < (time_t) ISECS(pi->pr_start) + 2 )  continue;

                if ( PRVMEM_TO_BYTES(pi->pr_size) < PROC_MEM_MAX)
                        memsize += PRVMEM_TO_BYTES(pi->pr_size);

        }

        return (memsize);
}


static memsize_t resi_sum(job *pjob)
{
        char  *id="resi_sum";
        memsize_t  resisize=0;
        int  iproc;

        for (iproc=0; iproc<nproc; iproc++) {
                psinfo_t *pi = &proc_info[iproc];

                if (!injob(pjob, pi->pr_sid))  continue;

                /*
                 * A feeble attempt to ignore the memory use of recently forked
                 * processes - ignore processes less than 2 seconds old
                 */
                if ( time_now < (time_t) ISECS(pi->pr_start) + 2 )  continue;

                if (PRRSS_TO_BYTES(pi->pr_rssize) < PROC_MEM_MAX)
                        resisize += PRRSS_TO_BYTES(pi->pr_rssize);
        }

        return (resisize);
}


<Prev in Thread] Current Thread [Next in Thread>