|
Re: Corrupt packets: msg#00009network.spread.user
Very interesting. I saw in the patch that you are checksumming both the daemon-to-daemon traffic (UDP) and the client-server (message contents only) which goes over TCP/UnixDomain. This is really strange, as both UDP and TCP have checksums and should not deliver corrupted data to the application (Spread) Were the UDP/TCP checksums valid on the 'corrupt' data -- I'd guess they had to be for the packets not to be dropped -- were you able to capture an example packet that had a valid checksum but was corrupt? This kind of checksum is something I'd like to avoid if possible as it complicates the code and is more overhead per packet -- but if we can have corrupt data delivery and it isn't just a particular OS bug, then it's worth considering. If the data is corrupted in kernel/memory before being sent but after "spread" finished with it, then that would explain the situation -- but should indicate an OS bug. Jonathan On Mon, Dec 04, 2006 at 09:01:51AM -0800, Alec H. Peterson wrote: > Hi all, > > So a few days ago I e-mailed about getting ring lockups. We tracked > this problem down to corrupt packets getting delivered to Spread > (both over the session and data link layers). I've attached a patch > that seems to address the problems by adding a checksum to the > appropriate data structures, and we feel this could potentially be > useful to others. If there are reasons why this shouldn't be > included in Spread we would love to know, because those may well be > reasons why we shouldn't use it. Clearly it changes the network > protocol, so it won't be compatible with other builds of Spread. > However, this does solve our lockup and corrupt data problems. > > We're also curious if anybody else has seen 'odd' Spread behavior > (like ring lockups and/or corrupt data delivered to the client > library). The configuration we have seen this on is very straight- > forward: > > Sun x4100 Server > Solaris 10 > Spread 3.17.3 (both stock and with some local patches) > > We have some very similar servers deployed in-house that do not > experience these problems at all. > > Thanks! > > Alec > > _______________________________________________ > Spread-users mailing list > Spread-users@xxxxxxxxxxxxxxxx > http://lists.spread.org/mailman/listinfo/spread-users -- ------------------------------------------------------- Jonathan R. Stanton jonathan@xxxxxxxxxx Dept. of Computer Science Johns Hopkins University ------------------------------------------------------- |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Spread 4 Solaris build failures: 00009, Jonathan Stanton |
|---|---|
| Next by Date: | Spread 3.17.4 release: 00009, Jonathan Stanton |
| Previous by Thread: | Corrupt packetsi: 00009, Alec H. Peterson |
| Next by Thread: | Re: Corrupt packets: 00009, Alec H. Peterson |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |