osdir.com
mailing list archive

Subject: Re: deadlock in scm_join_thread(_timed) - msg#00018

List: guile-devel-gnu

Date: Prev Next Index Thread: Prev Next Index
Hello!

Neil Jerram <neil@xxxxxxxxxxxxxxxxx> writes:

> Here is a proposed patch for branch_release-1-8.

At first sight this looks good to me.

Thanks!

Ludo'.



Was this page helpful?
Yes No
Thread at a glance:

Previous Message by Date: click to view message preview

Re: Sting abstraction 2

Hello! Mike Gran <spk121@xxxxxxxxx> writes: > Anyway, I backtracked a bit. Today I pushed a new tree > (string_abstraction2) which should be the same as master except with all > unnecessary calls to scm_i_string_chars, scm_i_symbol_chars, and > scm_i_string_writable_chars removed. I largely avoided other > unnecessary modifications. It is still an 8-bit string build, but, now > the string internals have been confined to strings.c and strports.c, > with very few exceptions. How does it differ from the approach you took a while back (e.g., http://thread.gmane.org/gmane.lisp.guile.devel/8436)? (The commit notification message is hard to follow because it contains many commits in addition to the relevant ones.) Thanks, Ludo'.

Next Message by Date: click to view message preview

Re: Sting abstraction 2

> From: Ludovic Courtès <ludo@xxxxxxx> > > Hello! > > Mike Gran writes: > > > Anyway, I backtracked a bit.  Today I pushed a new tree > > (string_abstraction2) which should be the same as master except with all > > unnecessary calls to scm_i_string_chars, scm_i_symbol_chars, and > > scm_i_string_writable_chars removed.  I largely avoided other > > unnecessary modifications.  It is still an 8-bit string build, but, now > > the string internals have been confined to strings.c and strports.c, > > with very few exceptions. > > How does it differ from the approach you took a while back (e.g., > http://thread.gmane.org/gmane.lisp.guile.devel/8436)? > It is exactly the same idea as before, but, more militantly applied. In my private build of my first attempt, I'm basically done.  All the pieces are there.  But it is buggy as hell.  In the first pass, I tried to do too many things at once: ports, locales, R6RS escapes.  I started getting some errors that were tough to debug (double-frees, SIGSEGV).  Since valgrind and mcheck are useless here, I got frustrated.  So, what I'm doing now, basically, is just adding the patches I made before in a more logical order and testing along the way. So, same patches, different order, better testing.  Just try to git 'er done. The locale conversion will still happen at scm_lfwrite and scm_getc. (Did I ever mention this backtrace tree pic?  http://www.lonelycactus.com/uploaded_images/test[1]-765536.PNG  It shows that for all the scripts the test suite, all of the calls to low-level read and write pass through those two functions.) I have changed my opinion on one issue. I don't believe that Guile ports should have a specific encoding: they should just use the locale.  This is just pragmatism.  Guile ports and the default reader are annoying things to hack.  I am loathe to touch them more than is necessary.  The R6RS ports have the nice transcoder idea.  It might be more fun to push port-specific encodings to that library.  The one weird side-effect of doing the locale conversion at scm_getc and scm_lfwrite is that the ports' buffers -- including string ports --  contain locale-encoded data. Since all the pieces have already been coded once already, it should come together quickly. -Mike

Previous Message by Thread: click to view message preview

Re: deadlock in scm_join_thread(_timed)

Neil Jerram <neil@xxxxxxxxxxxxxxxxx> writes: > "Julian Graham" <joolean@xxxxxxxxx> writes: > >> Hi Neil, >> >>> Based on the synopsis above, I agree that moving step 1 inside the loop >>> should fix this. In addition, though, I think it would be very good if we >>> could add a minimal test that currently reproduces the deadlock, and so will >>> serve to guard against future regressions here. Do you have such a test? >> >> I don't -- it seems to be pretty dependent on timing. I noticed it >> while running my SRFI-18 test suite in a loop, and it took hours to >> trigger. Any suggestions? Here is a proposed patch for branch_release-1-8. Neil >From 66f3b6c1b043b814663668b5f83210c6e8d1e12d Mon Sep 17 00:00:00 2001 From: Neil Jerram <neil@xxxxxxxxxxxxxxxxx> Date: Wed, 20 May 2009 21:55:35 +0100 Subject: [PATCH] Remove possible deadlock in scm_join_thread * libguile/threads.c (scm_join_thread): Always recheck t->exited before calling block_self again, in case thread t has now exited. * test-suite/tests/threads.test (joining): New test. --- libguile/threads.c | 17 +++++++---------- test-suite/tests/threads.test | 34 +++++++++++++++++++++++++++++++++- 2 files changed, 40 insertions(+), 11 deletions(-) diff --git a/libguile/threads.c b/libguile/threads.c index fc3e607..3d6df11 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -934,17 +934,14 @@ SCM_DEFINE (scm_join_thread, "join-thread", 1, 0, 0, scm_i_scm_pthread_mutex_lock (&thread_admin_mutex); t = SCM_I_THREAD_DATA (thread); - if (!t->exited) + while (!t->exited) { - while (1) - { - block_self (t->join_queue, thread, &thread_admin_mutex, NULL); - if (t->exited) - break; - scm_i_pthread_mutex_unlock (&thread_admin_mutex); - SCM_TICK; - scm_i_scm_pthread_mutex_lock (&thread_admin_mutex); - } + block_self (t->join_queue, thread, &thread_admin_mutex, NULL); + if (t->exited) + break; + scm_i_pthread_mutex_unlock (&thread_admin_mutex); + SCM_TICK; + scm_i_scm_pthread_mutex_lock (&thread_admin_mutex); } res = t->result; diff --git a/test-suite/tests/threads.test b/test-suite/tests/threads.test index 0146016..34ee7ee 100644 --- a/test-suite/tests/threads.test +++ b/test-suite/tests/threads.test @@ -133,4 +133,36 @@ (lambda (n) (set! result (cons n result))) (lambda (n) (* 2 n)) '(0 1 2 3 4 5)) - (equal? result '(10 8 6 4 2 0))))))) + (equal? result '(10 8 6 4 2 0))))) + + ;; + ;; thread joining + ;; + + (with-test-prefix "joining" + + ;; scm_join_thread has a SCM_TICK in the middle of it, to + ;; allow asyncs to run (including signal delivery). We used + ;; to have a bug whereby if the joined thread terminated at + ;; the same time as the joining thread is in this SCM_TICK, + ;; scm_join_thread would not notice and would hang forever. + ;; So in this test we are setting up the following sequence of + ;; events. + ;; T=0 other thread is created and starts running + ;; T=2 main thread sets up an async that will sleep for 10 seconds + ;; T=2 main thread calls join-thread, which will... + ;; T=2 ...call the async, which starts sleeping + ;; T=5 other thread finishes its work and terminates + ;; T=7 async completes, main thread continues inside join-thread. + (pass-if "don't hang when joined thread terminates in SCM_TICK" + (let ((other-thread (make-thread sleep 5))) + (letrec ((delay-count 10) + (aproc (lambda () + (set! delay-count (- delay-count 1)) + (if (zero? delay-count) + (sleep 5) + (system-async-mark aproc))))) + (sleep 2) + (system-async-mark aproc) + (join-thread other-thread))) + #t)))) -- 1.5.6.5

Next Message by Thread: click to view message preview

Re: deadlock in scm_join_thread(_timed)

ludo@xxxxxxx (Ludovic Courtès) writes: > Hello! > > Neil Jerram <neil@xxxxxxxxxxxxxxxxx> writes: > >> Here is a proposed patch for branch_release-1-8. > > At first sight this looks good to me. Thanks! And here's the corresponding patch for master. It's slightly different, because scm_join_thread_timed in master allows for the join attempt timing out and should return a special timeout value in that case. Also I had to fix another problem, wait-condition-variable leaving asyncs blocked, before I could reproduce the scm_join_thread_timed issue in threads.test, so a patch for that problem is attached too. Regards, Neil >From a83a927bdbd6d5b971aa6f8172b78a2cdf34a5ef Mon Sep 17 00:00:00 2001 From: Neil Jerram <neil@xxxxxxxxxxxxxxxxx> Date: Sat, 23 May 2009 17:55:58 +0100 Subject: [PATCH] Fix wait-condition-variable so that it doesn't leave asyncs blocked * libguile/threads.c (fat_mutex_unlock): Unblock asyncs when breaking out of loop. * test-suite/tests/threads.test (asyncs-still-working?): New function, to test if asyncs are working (i.e. unblocked). Use this throughout threads.test, in particular before and after the "timed locking succeeds if mutex unlocked within timeout" test. --- libguile/threads.c | 1 + test-suite/tests/threads.test | 35 +++++++++++++++++++++++++++++++++-- 2 files changed, 34 insertions(+), 2 deletions(-) diff --git a/libguile/threads.c b/libguile/threads.c index bb874e2..947e595 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -1491,6 +1491,7 @@ fat_mutex_unlock (SCM mutex, SCM cond, { if (relock) scm_lock_mutex_timed (mutex, SCM_UNDEFINED, owner); + t->block_asyncs--; break; } diff --git a/test-suite/tests/threads.test b/test-suite/tests/threads.test index caace7f..bd9f2f3 100644 --- a/test-suite/tests/threads.test +++ b/test-suite/tests/threads.test @@ -21,6 +21,12 @@ :use-module (ice-9 threads) :use-module (test-suite lib)) +(define (asyncs-still-working?) + (let ((a #f)) + (system-async-mark (lambda () + (set! a #t))) + (equal? '(a b c) '(a b c)) + a)) (if (provided? 'threads) (begin @@ -101,6 +107,9 @@ (with-test-prefix "n-for-each-par-map" + (pass-if "asyncs are still working 2" + (asyncs-still-working?)) + (pass-if "0 in limit 10" (n-for-each-par-map 10 noop noop '()) #t) @@ -143,12 +152,18 @@ (with-test-prefix "lock-mutex" + (pass-if "asyncs are still working 3" + (asyncs-still-working?)) + (pass-if "timed locking fails if timeout exceeded" (let ((m (make-mutex))) (lock-mutex m) (let ((t (begin-thread (lock-mutex m (+ (current-time) 1))))) (not (join-thread t))))) + (pass-if "asyncs are still working 6" + (asyncs-still-working?)) + (pass-if "timed locking succeeds if mutex unlocked within timeout" (let* ((m (make-mutex)) (c (make-condition-variable)) @@ -164,7 +179,12 @@ (unlock-mutex cm) (sleep 1) (unlock-mutex m) - (join-thread t))))) + (join-thread t)))) + + (pass-if "asyncs are still working 7" + (asyncs-still-working?)) + + ) ;; ;; timed mutex unlocking @@ -172,12 +192,18 @@ (with-test-prefix "unlock-mutex" + (pass-if "asyncs are still working 5" + (asyncs-still-working?)) + (pass-if "timed unlocking returns #f if timeout exceeded" (let ((m (make-mutex)) (c (make-condition-variable))) (lock-mutex m) (not (unlock-mutex m c (current-time))))) + (pass-if "asyncs are still working 4" + (asyncs-still-working?)) + (pass-if "timed unlocking returns #t if condition signaled" (let ((m1 (make-mutex)) (m2 (make-mutex)) @@ -226,7 +252,12 @@ (pass-if "timed joining succeeds if thread exits within timeout" (let ((t (begin-thread (begin (sleep 1) #t)))) - (join-thread t (+ (current-time) 2))))) + (join-thread t (+ (current-time) 2)))) + + (pass-if "asyncs are still working 1" + (asyncs-still-working?)) + + ) ;; ;; thread cancellation -- 1.5.6.5 >From 01404cdfacabf49a7b834837bd3c2acebaefc591 Mon Sep 17 00:00:00 2001 From: Neil Jerram <neil@xxxxxxxxxxxxxxxxx> Date: Wed, 20 May 2009 21:55:35 +0100 Subject: [PATCH] Remove possible deadlock in scm_join_thread_timed * libguile/threads.c (scm_join_thread_timed): Recheck t->exited before looping round to call block_self again, in case thread t has now exited. * test-suite/tests/threads.test ("don't hang when joined thread terminates in SCM_TICK"): New test. --- libguile/threads.c | 10 ++++++++++ test-suite/tests/threads.test | 26 +++++++++++++++++++++++++- 2 files changed, 35 insertions(+), 1 deletions(-) diff --git a/libguile/threads.c b/libguile/threads.c index 947e595..d63c619 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -1161,6 +1161,16 @@ SCM_DEFINE (scm_join_thread_timed, "join-thread", 1, 2, 0, scm_i_pthread_mutex_unlock (&t->admin_mutex); SCM_TICK; scm_i_scm_pthread_mutex_lock (&t->admin_mutex); + + /* Check for exit again, since we just released and + reacquired the admin mutex, before the next block_self + call (which would block forever if t has already + exited). */ + if (t->exited) + { + res = t->result; + break; + } } } diff --git a/test-suite/tests/threads.test b/test-suite/tests/threads.test index bd9f2f3..6d877b1 100644 --- a/test-suite/tests/threads.test +++ b/test-suite/tests/threads.test @@ -257,7 +257,31 @@ (pass-if "asyncs are still working 1" (asyncs-still-working?)) - ) + ;; scm_join_thread_timed has a SCM_TICK in the middle of it, + ;; to allow asyncs to run (including signal delivery). We + ;; used to have a bug whereby if the joined thread terminated + ;; at the same time as the joining thread is in this SCM_TICK, + ;; scm_join_thread_timed would not notice and would hang + ;; forever. So in this test we are setting up the following + ;; sequence of events. + ;; T=0 other thread is created and starts running + ;; T=2 main thread sets up an async that will sleep for 10 seconds + ;; T=2 main thread calls join-thread, which will... + ;; T=2 ...call the async, which starts sleeping + ;; T=5 other thread finishes its work and terminates + ;; T=7 async completes, main thread continues inside join-thread. + (pass-if "don't hang when joined thread terminates in SCM_TICK" + (let ((other-thread (make-thread sleep 5))) + (letrec ((delay-count 10) + (aproc (lambda () + (set! delay-count (- delay-count 1)) + (if (zero? delay-count) + (sleep 5) + (system-async-mark aproc))))) + (sleep 2) + (system-async-mark aproc) + (join-thread other-thread))) + #t)) ;; ;; thread cancellation -- 1.5.6.5
Sign up for updates to this mailing list. email:
Loading Comments...
Home | News | Patents | Sitemap | FAQ | advertise

Advertising by