logo       

Re: Corrupt packets: msg#00009

network.spread.user

Subject: Re: Corrupt packets

Very interesting.

I saw in the patch that you are checksumming both the daemon-to-daemon
traffic (UDP) and the client-server (message contents only) which goes
over TCP/UnixDomain. This is really strange, as both UDP and TCP have
checksums and should not deliver corrupted data to the application
(Spread)

Were the UDP/TCP checksums valid on the 'corrupt' data -- I'd guess they
had to be for the packets not to be dropped -- were you able to capture
an example packet that had a valid checksum but was corrupt?

This kind of checksum is something I'd like to avoid if possible as it
complicates the code and is more overhead per packet -- but if we can
have corrupt data delivery and it isn't just a particular OS bug, then
it's worth considering.

If the data is corrupted in kernel/memory before being sent but after
"spread" finished with it, then that would explain the situation -- but
should indicate an OS bug.

Jonathan
On Mon, Dec 04, 2006 at 09:01:51AM -0800, Alec H. Peterson wrote:
> Hi all,
>
> So a few days ago I e-mailed about getting ring lockups. We tracked
> this problem down to corrupt packets getting delivered to Spread
> (both over the session and data link layers). I've attached a patch
> that seems to address the problems by adding a checksum to the
> appropriate data structures, and we feel this could potentially be
> useful to others. If there are reasons why this shouldn't be
> included in Spread we would love to know, because those may well be
> reasons why we shouldn't use it. Clearly it changes the network
> protocol, so it won't be compatible with other builds of Spread.
> However, this does solve our lockup and corrupt data problems.
>
> We're also curious if anybody else has seen 'odd' Spread behavior
> (like ring lockups and/or corrupt data delivered to the client
> library). The configuration we have seen this on is very straight-
> forward:
>
> Sun x4100 Server
> Solaris 10
> Spread 3.17.3 (both stock and with some local patches)
>
> We have some very similar servers deployed in-house that do not
> experience these problems at all.
>
> Thanks!
>
> Alec
>


> _______________________________________________
> Spread-users mailing list
> Spread-users@xxxxxxxxxxxxxxxx
> http://lists.spread.org/mailman/listinfo/spread-users


--
-------------------------------------------------------
Jonathan R. Stanton jonathan@xxxxxxxxxx
Dept. of Computer Science
Johns Hopkins University
-------------------------------------------------------


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise