|
|
Subject: Re: deadlock in scm_join_thread(_timed) - msg#00018
List: guile-devel-gnu
Hello!
Neil Jerram <neil@xxxxxxxxxxxxxxxxx> writes:
> Here is a proposed patch for branch_release-1-8.
At first sight this looks good to me.
Thanks!
Ludo'.
Was this page helpful?
Thread at a glance:
Previous Message by Date:
click to view message preview
Re: Sting abstraction 2
Hello!
Mike Gran <spk121@xxxxxxxxx> writes:
> Anyway, I backtracked a bit. Today I pushed a new tree
> (string_abstraction2) which should be the same as master except with all
> unnecessary calls to scm_i_string_chars, scm_i_symbol_chars, and
> scm_i_string_writable_chars removed. I largely avoided other
> unnecessary modifications. It is still an 8-bit string build, but, now
> the string internals have been confined to strings.c and strports.c,
> with very few exceptions.
How does it differ from the approach you took a while back (e.g.,
http://thread.gmane.org/gmane.lisp.guile.devel/8436)?
(The commit notification message is hard to follow because it contains
many commits in addition to the relevant ones.)
Thanks,
Ludo'.
Next Message by Date:
click to view message preview
Re: Sting abstraction 2
> From: Ludovic Courtès <ludo@xxxxxxx>
>
> Hello!
>
> Mike Gran writes:
>
> > Anyway, I backtracked a bit. Today I pushed a new tree
> > (string_abstraction2) which should be the same as master except with all
> > unnecessary calls to scm_i_string_chars, scm_i_symbol_chars, and
> > scm_i_string_writable_chars removed. I largely avoided other
> > unnecessary modifications. It is still an 8-bit string build, but, now
> > the string internals have been confined to strings.c and strports.c,
> > with very few exceptions.
>
> How does it differ from the approach you took a while back (e.g.,
> http://thread.gmane.org/gmane.lisp.guile.devel/8436)?
>
It is exactly the same idea as before, but, more militantly applied.
In my private build of my first attempt, I'm basically done. All the pieces
are there. But it is buggy as hell. In the first pass, I tried to do too many
things at once: ports, locales, R6RS escapes. I started getting some errors
that were tough to debug (double-frees, SIGSEGV). Since valgrind and mcheck
are useless here, I got frustrated. So, what I'm doing now, basically, is just
adding the patches I made before in a more logical order and testing along the
way.
So, same patches, different order, better testing. Just try to git 'er done.
The locale conversion will still happen at scm_lfwrite and scm_getc.
(Did I ever mention this backtrace tree
pic? http://www.lonelycactus.com/uploaded_images/test[1]-765536.PNG It shows
that for all the scripts the test suite, all of the calls to low-level read and
write pass through those two functions.)
I have changed my opinion on one issue. I don't believe that Guile ports should
have a specific encoding: they should just use the locale. This is just
pragmatism. Guile ports and the default reader are annoying things to hack. I
am loathe to touch them more than is necessary.
The R6RS ports have the nice transcoder idea. It might be more fun to push
port-specific encodings to that library.
The one weird side-effect of doing the locale conversion at scm_getc and
scm_lfwrite is that the ports' buffers -- including string ports -- contain
locale-encoded data.
Since all the pieces have already been coded once already, it should come
together quickly.
-Mike
Previous Message by Thread:
click to view message preview
Re: deadlock in scm_join_thread(_timed)
Neil Jerram <neil@xxxxxxxxxxxxxxxxx> writes:
> "Julian Graham" <joolean@xxxxxxxxx> writes:
>
>> Hi Neil,
>>
>>> Based on the synopsis above, I agree that moving step 1 inside the loop
>>> should fix this. In addition, though, I think it would be very good if we
>>> could add a minimal test that currently reproduces the deadlock, and so will
>>> serve to guard against future regressions here. Do you have such a test?
>>
>> I don't -- it seems to be pretty dependent on timing. I noticed it
>> while running my SRFI-18 test suite in a loop, and it took hours to
>> trigger. Any suggestions?
Here is a proposed patch for branch_release-1-8.
Neil
>From 66f3b6c1b043b814663668b5f83210c6e8d1e12d Mon Sep 17 00:00:00 2001
From: Neil Jerram <neil@xxxxxxxxxxxxxxxxx>
Date: Wed, 20 May 2009 21:55:35 +0100
Subject: [PATCH] Remove possible deadlock in scm_join_thread
* libguile/threads.c (scm_join_thread): Always recheck t->exited
before calling block_self again, in case thread t has now exited.
* test-suite/tests/threads.test (joining): New test.
---
libguile/threads.c | 17 +++++++----------
test-suite/tests/threads.test | 34 +++++++++++++++++++++++++++++++++-
2 files changed, 40 insertions(+), 11 deletions(-)
diff --git a/libguile/threads.c b/libguile/threads.c
index fc3e607..3d6df11 100644
--- a/libguile/threads.c
+++ b/libguile/threads.c
@@ -934,17 +934,14 @@ SCM_DEFINE (scm_join_thread, "join-thread", 1, 0, 0,
scm_i_scm_pthread_mutex_lock (&thread_admin_mutex);
t = SCM_I_THREAD_DATA (thread);
- if (!t->exited)
+ while (!t->exited)
{
- while (1)
- {
- block_self (t->join_queue, thread, &thread_admin_mutex, NULL);
- if (t->exited)
- break;
- scm_i_pthread_mutex_unlock (&thread_admin_mutex);
- SCM_TICK;
- scm_i_scm_pthread_mutex_lock (&thread_admin_mutex);
- }
+ block_self (t->join_queue, thread, &thread_admin_mutex, NULL);
+ if (t->exited)
+ break;
+ scm_i_pthread_mutex_unlock (&thread_admin_mutex);
+ SCM_TICK;
+ scm_i_scm_pthread_mutex_lock (&thread_admin_mutex);
}
res = t->result;
diff --git a/test-suite/tests/threads.test b/test-suite/tests/threads.test
index 0146016..34ee7ee 100644
--- a/test-suite/tests/threads.test
+++ b/test-suite/tests/threads.test
@@ -133,4 +133,36 @@
(lambda (n) (set! result (cons n result)))
(lambda (n) (* 2 n))
'(0 1 2 3 4 5))
- (equal? result '(10 8 6 4 2 0)))))))
+ (equal? result '(10 8 6 4 2 0)))))
+
+ ;;
+ ;; thread joining
+ ;;
+
+ (with-test-prefix "joining"
+
+ ;; scm_join_thread has a SCM_TICK in the middle of it, to
+ ;; allow asyncs to run (including signal delivery). We used
+ ;; to have a bug whereby if the joined thread terminated at
+ ;; the same time as the joining thread is in this SCM_TICK,
+ ;; scm_join_thread would not notice and would hang forever.
+ ;; So in this test we are setting up the following sequence of
+ ;; events.
+ ;; T=0 other thread is created and starts running
+ ;; T=2 main thread sets up an async that will sleep for 10 seconds
+ ;; T=2 main thread calls join-thread, which will...
+ ;; T=2 ...call the async, which starts sleeping
+ ;; T=5 other thread finishes its work and terminates
+ ;; T=7 async completes, main thread continues inside join-thread.
+ (pass-if "don't hang when joined thread terminates in SCM_TICK"
+ (let ((other-thread (make-thread sleep 5)))
+ (letrec ((delay-count 10)
+ (aproc (lambda ()
+ (set! delay-count (- delay-count 1))
+ (if (zero? delay-count)
+ (sleep 5)
+ (system-async-mark aproc)))))
+ (sleep 2)
+ (system-async-mark aproc)
+ (join-thread other-thread)))
+ #t))))
--
1.5.6.5
Next Message by Thread:
click to view message preview
Re: deadlock in scm_join_thread(_timed)
ludo@xxxxxxx (Ludovic Courtès) writes:
> Hello!
>
> Neil Jerram <neil@xxxxxxxxxxxxxxxxx> writes:
>
>> Here is a proposed patch for branch_release-1-8.
>
> At first sight this looks good to me.
Thanks! And here's the corresponding patch for master. It's slightly
different, because scm_join_thread_timed in master allows for the join
attempt timing out and should return a special timeout value in that
case. Also I had to fix another problem, wait-condition-variable
leaving asyncs blocked, before I could reproduce the
scm_join_thread_timed issue in threads.test, so a patch for that
problem is attached too.
Regards,
Neil
>From a83a927bdbd6d5b971aa6f8172b78a2cdf34a5ef Mon Sep 17 00:00:00 2001
From: Neil Jerram <neil@xxxxxxxxxxxxxxxxx>
Date: Sat, 23 May 2009 17:55:58 +0100
Subject: [PATCH] Fix wait-condition-variable so that it doesn't leave asyncs
blocked
* libguile/threads.c (fat_mutex_unlock): Unblock asyncs when breaking
out of loop.
* test-suite/tests/threads.test (asyncs-still-working?): New function,
to test if asyncs are working (i.e. unblocked). Use this throughout
threads.test, in particular before and after the "timed locking
succeeds if mutex unlocked within timeout" test.
---
libguile/threads.c | 1 +
test-suite/tests/threads.test | 35 +++++++++++++++++++++++++++++++++--
2 files changed, 34 insertions(+), 2 deletions(-)
diff --git a/libguile/threads.c b/libguile/threads.c
index bb874e2..947e595 100644
--- a/libguile/threads.c
+++ b/libguile/threads.c
@@ -1491,6 +1491,7 @@ fat_mutex_unlock (SCM mutex, SCM cond,
{
if (relock)
scm_lock_mutex_timed (mutex, SCM_UNDEFINED, owner);
+ t->block_asyncs--;
break;
}
diff --git a/test-suite/tests/threads.test b/test-suite/tests/threads.test
index caace7f..bd9f2f3 100644
--- a/test-suite/tests/threads.test
+++ b/test-suite/tests/threads.test
@@ -21,6 +21,12 @@
:use-module (ice-9 threads)
:use-module (test-suite lib))
+(define (asyncs-still-working?)
+ (let ((a #f))
+ (system-async-mark (lambda ()
+ (set! a #t)))
+ (equal? '(a b c) '(a b c))
+ a))
(if (provided? 'threads)
(begin
@@ -101,6 +107,9 @@
(with-test-prefix "n-for-each-par-map"
+ (pass-if "asyncs are still working 2"
+ (asyncs-still-working?))
+
(pass-if "0 in limit 10"
(n-for-each-par-map 10 noop noop '())
#t)
@@ -143,12 +152,18 @@
(with-test-prefix "lock-mutex"
+ (pass-if "asyncs are still working 3"
+ (asyncs-still-working?))
+
(pass-if "timed locking fails if timeout exceeded"
(let ((m (make-mutex)))
(lock-mutex m)
(let ((t (begin-thread (lock-mutex m (+ (current-time) 1)))))
(not (join-thread t)))))
+ (pass-if "asyncs are still working 6"
+ (asyncs-still-working?))
+
(pass-if "timed locking succeeds if mutex unlocked within timeout"
(let* ((m (make-mutex))
(c (make-condition-variable))
@@ -164,7 +179,12 @@
(unlock-mutex cm)
(sleep 1)
(unlock-mutex m)
- (join-thread t)))))
+ (join-thread t))))
+
+ (pass-if "asyncs are still working 7"
+ (asyncs-still-working?))
+
+ )
;;
;; timed mutex unlocking
@@ -172,12 +192,18 @@
(with-test-prefix "unlock-mutex"
+ (pass-if "asyncs are still working 5"
+ (asyncs-still-working?))
+
(pass-if "timed unlocking returns #f if timeout exceeded"
(let ((m (make-mutex))
(c (make-condition-variable)))
(lock-mutex m)
(not (unlock-mutex m c (current-time)))))
+ (pass-if "asyncs are still working 4"
+ (asyncs-still-working?))
+
(pass-if "timed unlocking returns #t if condition signaled"
(let ((m1 (make-mutex))
(m2 (make-mutex))
@@ -226,7 +252,12 @@
(pass-if "timed joining succeeds if thread exits within timeout"
(let ((t (begin-thread (begin (sleep 1) #t))))
- (join-thread t (+ (current-time) 2)))))
+ (join-thread t (+ (current-time) 2))))
+
+ (pass-if "asyncs are still working 1"
+ (asyncs-still-working?))
+
+ )
;;
;; thread cancellation
--
1.5.6.5
>From 01404cdfacabf49a7b834837bd3c2acebaefc591 Mon Sep 17 00:00:00 2001
From: Neil Jerram <neil@xxxxxxxxxxxxxxxxx>
Date: Wed, 20 May 2009 21:55:35 +0100
Subject: [PATCH] Remove possible deadlock in scm_join_thread_timed
* libguile/threads.c (scm_join_thread_timed): Recheck t->exited before
looping round to call block_self again, in case thread t has now
exited.
* test-suite/tests/threads.test ("don't hang when joined thread
terminates in SCM_TICK"): New test.
---
libguile/threads.c | 10 ++++++++++
test-suite/tests/threads.test | 26 +++++++++++++++++++++++++-
2 files changed, 35 insertions(+), 1 deletions(-)
diff --git a/libguile/threads.c b/libguile/threads.c
index 947e595..d63c619 100644
--- a/libguile/threads.c
+++ b/libguile/threads.c
@@ -1161,6 +1161,16 @@ SCM_DEFINE (scm_join_thread_timed, "join-thread", 1, 2,
0,
scm_i_pthread_mutex_unlock (&t->admin_mutex);
SCM_TICK;
scm_i_scm_pthread_mutex_lock (&t->admin_mutex);
+
+ /* Check for exit again, since we just released and
+ reacquired the admin mutex, before the next block_self
+ call (which would block forever if t has already
+ exited). */
+ if (t->exited)
+ {
+ res = t->result;
+ break;
+ }
}
}
diff --git a/test-suite/tests/threads.test b/test-suite/tests/threads.test
index bd9f2f3..6d877b1 100644
--- a/test-suite/tests/threads.test
+++ b/test-suite/tests/threads.test
@@ -257,7 +257,31 @@
(pass-if "asyncs are still working 1"
(asyncs-still-working?))
- )
+ ;; scm_join_thread_timed has a SCM_TICK in the middle of it,
+ ;; to allow asyncs to run (including signal delivery). We
+ ;; used to have a bug whereby if the joined thread terminated
+ ;; at the same time as the joining thread is in this SCM_TICK,
+ ;; scm_join_thread_timed would not notice and would hang
+ ;; forever. So in this test we are setting up the following
+ ;; sequence of events.
+ ;; T=0 other thread is created and starts running
+ ;; T=2 main thread sets up an async that will sleep for 10 seconds
+ ;; T=2 main thread calls join-thread, which will...
+ ;; T=2 ...call the async, which starts sleeping
+ ;; T=5 other thread finishes its work and terminates
+ ;; T=7 async completes, main thread continues inside join-thread.
+ (pass-if "don't hang when joined thread terminates in SCM_TICK"
+ (let ((other-thread (make-thread sleep 5)))
+ (letrec ((delay-count 10)
+ (aproc (lambda ()
+ (set! delay-count (- delay-count 1))
+ (if (zero? delay-count)
+ (sleep 5)
+ (system-async-mark aproc)))))
+ (sleep 2)
+ (system-async-mark aproc)
+ (join-thread other-thread)))
+ #t))
;;
;; thread cancellation
--
1.5.6.5
|
|