logo       

Re: test hung after 36 hours: msg#00024

linux.redhat.cluster

Subject: Re: test hung after 36 hours

On Mon, Apr 11, 2005 at 05:13:06PM -0700, Daniel McNeil wrote:
> I started my mount/tar/rm/ tests on Apr 4 17:41 and I hit
> a problem at Apr 6 05:30. So the test ran for 36 hours.
> cl030 and cl031 were getting "SM: process_reply invalid"
> messages and cl032 got "No response" and "Missed too many
> heartbeats"

The SM messages are an effect of CMAN removing nodes. There's a fair
chance that this recent fix will help:
http://sources.redhat.com/ml/cluster-cvs/2005-q2/msg00018.html

--
Dave Teigland <teigland@xxxxxxxxxx>



<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise