osdir.com
mailing list archive

Subject: RE: YAFFS2 and kswapd dead lock problem - msg#00006

List: linux.file-systems.yaffs

Date: Prev Next Index Thread: Prev Next Index
> -----Original Message-----
> From: Charles Manning [mailto:manningc2@xxxxxxxxxxxxx]
> Sent: Sunday, December 04, 2005 9:55 AM
> To: yaffs@xxxxxxxxxxxxxxxxxxxxxx
> Cc: Lawson.Reed
> Subject: Re: [Yaffs] YAFFS2 and kswapd dead lock problem
>
>
> On Saturday 03 December 2005 07:56, Lawson.Reed wrote:
> > Found it.... took me 5 work days.
> > And this deadlock issue IS in the YAFFS2 tip code on CVS here:
> >
> http://www.aleph1.co.uk/cgi-bin/viewcvs.cgi/yaffs2/yaffs_fs.c?
> rev=1.34&view
> >=auto
> >
>
> Thanks for this huge effort Reed.
>
> I am quite surprised that someone else has not stumbled on
> this either.
>
> Does it only impact on 2.4.x kernels or also 2.6?

>From looking at the inode.c in 2.6 I see they are still doing the
wait_queue thing on the I_FREEING flag just the same as 2.4.
So, this deadlock issue should also be in 2.6 kernels. I do not
have a system running 2.6 so I can not verify this.

It only occurs under heavy multi thread load on the yaffs2 fs.
It is fairly infrequent and to troubleshoot it, I had to write
a torture test program.

> > So, no one has seen this???
> >
> > Here is what is happening:
> >
> > Process 'A' grabs the YAFFS2 grossLock.
> > Process 'B' preempts and it's job is to free unused inodes
> everywhere.
> > (hint: 'B' is kswapd). So, 'B' sets I_FREEING. Then it calls
> > yaffs_clear_inode() which needs the grossLock. So, it goes
> > on the wait queue because 'A' has the grossLock.
> >
> > Now process 'A' runs. It's holding the grossLock. It calls
> > yaffs_get_inode() which calls BACK UP to iget()... With
> > the grossLock held! That calls find_inode(). It finds
> > I_FREEING set and then gets put on a wait queue in
> > __wait_on_freeing_inode().
> >
> > Presto chango deadlock.
> >
> > So, my solution is to make sure the grossLock is not held when
> > calling yaffs_get_inode(). Plus, I added grossLocking to
> > yaffs_read_inode() since NB's comment in there is no longer
> > true.
> >
> > I ran my 20 thread torture test which usually deadlocks in under 30
> > seconds. It ran overnight with this fix. The test found no compare
> > errors in the 20 files that it reads and writes at random times with
> > random data and random lengths.
> >
> > So, I strongly suggest that someone close to the YAFFSs
> effort review
> > this change and incorporate it. I am kinda new to all this and I'm
> > not even sure what the correct way to submit the changes are.
> > So, let me know how I can help.
>
> Did you mean to include some code?

No.

> The easiest is to send something in patch form using "diff
> -Naur old new", but others are fine too.

OK, how is this? Warning... I did not even compile this in the
tip YAFFS2 code because I do not have a 2.6 tree. So, I copied my
changes from my 2.4 tree into the tip yaffs2/yaffs_fs.c code.
So, please review it carefully...

I started with this:
http://www.aleph1.co.uk/cgi-bin/viewcvs.cgi/yaffs2/yaffs_fs.c?rev=1.34&view=auto
File: [Development] / yaffs2 / yaffs_fs.c (download) (as text)
Revision: 1.34, Mon Nov 14 21:00:54 2005 UTC (2 weeks, 3 days ago) by charles

Thanks,
- Reed.

--------------------------------------------8<--------------------------------
diff -Naur yaffs2_old/yaffs_fs.c yaffs2/yaffs_fs.c
--- yaffs2_old/yaffs_fs.c 2005-12-05 09:53:10.000000000 -0800
+++ yaffs2/yaffs_fs.c 2005-12-05 10:15:32.000000000 -0800
@@ -310,6 +310,9 @@

obj = yaffs_GetEquivalentObject(obj); /* in case it was a hardlink */

+ // can not hold grossLock when calling yaffs_get_inode()
+ yaffs_GrossUnlock(dev);
+
if (obj) {
T(YAFFS_TRACE_OS,
(KERN_DEBUG "yaffs_lookup found %d\n", obj->objectId));
@@ -325,8 +328,6 @@
/*dget(dentry); // try to solve directory bug */
d_add(dentry, inode);

- yaffs_GrossUnlock(dev);
-
/* return dentry; */
return NULL;
#endif
@@ -336,7 +337,6 @@
T(YAFFS_TRACE_OS, (KERN_DEBUG "yaffs_lookup not found\n"));

}
- yaffs_GrossUnlock(dev);

/* added NCB for 2.5/6 compatability - forces add even if inode is
* NULL which creates dentry hash */
@@ -725,6 +725,8 @@

/* NB Side effect: iget calls back to yaffs_read_inode(). */
/* iget also increments the inode's i_count */
+ // RL AND you better not be holding the grossLock, or dead lock with
wait_on_freeing_inode()
+ // will occur because of kswapd (and other things)!!

return inode;
}
@@ -941,6 +943,9 @@
break;
}

+ // can not call yaffs_get_inode() with grossLock held.
+ yaffs_GrossUnlock(dev);
+
if (obj) {
inode = yaffs_get_inode(dir->i_sb, mode, rdev, obj);
d_instantiate(dentry, inode);
@@ -954,8 +959,6 @@
error = -ENOMEM;
}

- yaffs_GrossUnlock(dev);
-
return error;
}

@@ -1238,6 +1241,7 @@
{
/* NB This is called as a side effect of other functions and
* thus gross locking should always be in place already.
+ * RL Put the grossLock back in because of dead lock issue.
*/

yaffs_Object *obj;
@@ -1246,10 +1250,14 @@
T(YAFFS_TRACE_OS,
(KERN_DEBUG "yaffs_read_inode for %d\n", (int)inode->i_ino));

+ yaffs_GrossLock(dev);
+
obj = yaffs_FindObjectByNumber(dev, inode->i_ino);

yaffs_FillInodeFromObject(inode, obj);

+ yaffs_GrossUnlock(dev);
+
}

static LIST_HEAD(yaffs_dev_list);
@@ -1478,13 +1486,13 @@
("yaffs_read_super: guts initialised %s\n",
(err == YAFFS_OK) ? "OK" : "FAILED"));

+ // Can not call yaffs_get_inode() with the grossLock held.
+ yaffs_GrossUnlock(dev);
+
/* Create root inode */
if (err == YAFFS_OK)
inode = yaffs_get_inode(sb, S_IFDIR | 0755, 0,
yaffs_Root(dev));
-
- yaffs_GrossUnlock(dev);
-
if (!inode)
return NULL;

--------------------------------------------8<--------------------------------
>
> Thanx
>
> Charles
>


Was this page helpful?
Yes No
Thread at a glance:

Previous Message by Date: click to view message preview

Re: YAFFS2 and kswapd dead lock problem

On Saturday 03 December 2005 07:56, Lawson.Reed wrote: > Found it.... took me 5 work days. > And this deadlock issue IS in the YAFFS2 tip code on CVS here: > http://www.aleph1.co.uk/cgi-bin/viewcvs.cgi/yaffs2/yaffs_fs.c?rev=1.34&view >=auto > Thanks for this huge effort Reed. I am quite surprised that someone else has not stumbled on this either. Does it only impact on 2.4.x kernels or also 2.6? > So, no one has seen this??? > > Here is what is happening: > > Process 'A' grabs the YAFFS2 grossLock. > Process 'B' preempts and it's job is to free unused inodes everywhere. > (hint: 'B' is kswapd). So, 'B' sets I_FREEING. Then it calls > yaffs_clear_inode() which needs the grossLock. So, it goes > on the wait queue because 'A' has the grossLock. > > Now process 'A' runs. It's holding the grossLock. It calls > yaffs_get_inode() which calls BACK UP to iget()... With > the grossLock held! That calls find_inode(). It finds > I_FREEING set and then gets put on a wait queue in > __wait_on_freeing_inode(). > > Presto chango deadlock. > > So, my solution is to make sure the grossLock is not held when > calling yaffs_get_inode(). Plus, I added grossLocking to > yaffs_read_inode() since NB's comment in there is no longer > true. > > I ran my 20 thread torture test which usually deadlocks in under 30 > seconds. It ran overnight with this fix. The test found no compare > errors in the 20 files that it reads and writes at random times with > random data and random lengths. > > So, I strongly suggest that someone close to the YAFFSs effort review > this change and incorporate it. I am kinda new to all this and I'm > not even sure what the correct way to submit the changes are. > So, let me know how I can help. Did you mean to include some code? The easiest is to send something in patch form using "diff -Naur old new", but others are fine too. Thanx Charles

Next Message by Date: click to view message preview

mkyaffs*image argument #3 optional

Currently it is required to pass some string in place of the optional "convert" parameter if conversion is not desired. While getopt(3) would be a more scalable longer-term solution, for now it is proposed to make the third parameter optional. Any disagreement? Thanks -- Todd Index: utils/mkyaffs2image.c =================================================================== RCS file: /home/aleph1/cvs/yaffs2/utils/mkyaffs2image.c,v retrieving revision 1.1 diff -u -r1.1 mkyaffs2image.c --- utils/mkyaffs2image.c 2 Nov 2005 06:45:24 -0000 1.1 +++ utils/mkyaffs2image.c 6 Dec 2005 03:00:50 -0000 @@ -461,7 +461,7 @@ printf("mkyaffs2image: image building tool for YAFFS2 built "__DATE__"\n"); - if(argc <= 3) + if(argc < 3) { printf("usage: mkyaffs2image dir image_file [convert]\n"); printf(" dir the directory tree to be converted\n"); Index: utils/mkyaffsimage.c =================================================================== RCS file: /home/aleph1/cvs/yaffs2/utils/mkyaffsimage.c,v retrieving revision 1.1 diff -u -r1.1 mkyaffsimage.c --- utils/mkyaffsimage.c 6 Dec 2005 02:43:01 -0000 1.1 +++ utils/mkyaffsimage.c 6 Dec 2005 03:00:50 -0000 @@ -532,7 +532,7 @@ printf("mkyaffsimage: image building tool for YAFFS built "__DATE__"\n"); - if(argc <= 3) + if(argc < 3) { printf("usage: mkyaffsimage dir image_file [convert]\n"); printf(" dir the directory tree to be converted\n");

Previous Message by Thread: click to view message preview

Re: YAFFS2 and kswapd dead lock problem

On Saturday 03 December 2005 07:56, Lawson.Reed wrote: > Found it.... took me 5 work days. > And this deadlock issue IS in the YAFFS2 tip code on CVS here: > http://www.aleph1.co.uk/cgi-bin/viewcvs.cgi/yaffs2/yaffs_fs.c?rev=1.34&view >=auto > Thanks for this huge effort Reed. I am quite surprised that someone else has not stumbled on this either. Does it only impact on 2.4.x kernels or also 2.6? > So, no one has seen this??? > > Here is what is happening: > > Process 'A' grabs the YAFFS2 grossLock. > Process 'B' preempts and it's job is to free unused inodes everywhere. > (hint: 'B' is kswapd). So, 'B' sets I_FREEING. Then it calls > yaffs_clear_inode() which needs the grossLock. So, it goes > on the wait queue because 'A' has the grossLock. > > Now process 'A' runs. It's holding the grossLock. It calls > yaffs_get_inode() which calls BACK UP to iget()... With > the grossLock held! That calls find_inode(). It finds > I_FREEING set and then gets put on a wait queue in > __wait_on_freeing_inode(). > > Presto chango deadlock. > > So, my solution is to make sure the grossLock is not held when > calling yaffs_get_inode(). Plus, I added grossLocking to > yaffs_read_inode() since NB's comment in there is no longer > true. > > I ran my 20 thread torture test which usually deadlocks in under 30 > seconds. It ran overnight with this fix. The test found no compare > errors in the 20 files that it reads and writes at random times with > random data and random lengths. > > So, I strongly suggest that someone close to the YAFFSs effort review > this change and incorporate it. I am kinda new to all this and I'm > not even sure what the correct way to submit the changes are. > So, let me know how I can help. Did you mean to include some code? The easiest is to send something in patch form using "diff -Naur old new", but others are fine too. Thanx Charles

Next Message by Thread: click to view message preview

Re: YAFFS2 and kswapd dead lock problem --> fix

Hi Reed The patch would not stick (I think I screwed up the text handling), so I added it manually. I also wanted to change comments to /*...*/ to keep the style guide Gestapo happy too. The fix is: http://www.aleph1.co.uk/cgi-bin/viewcvs.cgi/yaffs2/yaffs_fs.c?r1=1.35&r2=1.36 I would really appreciate you reviewing the changes. Thanx again for hunting this. -- CHarles
Sign up for updates to this mailing list. email:
Loading Comments...
Home | News | Patents | Sitemap | FAQ | advertise

Advertising by