logo       

Re: oops in tcp_v4_rcv.: msg#00346

Subject: Re: oops in tcp_v4_rcv.
[netdev added to cc list]

I think I understand now what causes the crash:
The tcp_ehash assumes that the entries are of the type 'struct inet_sock'.
But the actual entry is of the type tcp_tw_bucket. And 'sk->inet.daddr' is not shared between both structures.


<< net/ipv4/tcp_ipv4, line 510:
       /* Must check for a TIME_WAIT'er before going to listener hash. */
       for (sk = (head + tcp_ehash_size)->chain; sk; sk = sk->next)
              if (TCP_IPV4_MATCH(sk, acookie, saddr, daddr, ports, dif))
                   goto hit;
<<
preprocessor output:
<<
for (sk = (head + (tcp_hashinfo.__tcp_ehash_size))->chain; sk; sk = sk->next)
     if ((((&((struct inet_sock *)sk)->inet)->daddr == (saddr)) &&
          ((&((struct inet_sock *)sk)->inet)->rcv_saddr == (daddr)) &&
((*((__u32 *)&((&((struct inet_sock *)sk)->inet)->dport)))== (ports)) &&
          (!((sk)->bound_dev_if) || ((sk)->bound_dev_if == (dif)))))
                      goto hit;
<<


Manfred Spraul wrote:

Hi,

I'm looking at crashes that occur during network stress testing with the CONFIG_DEBUG_PAGEALLOC from -mm: Pages that are not in use are immediately unmapped from the linear mapping, and thus reading stale pointer causes an immediate oops.

I've now analyzed one crash:
the oops is in __tcp_v4_lookup_established, in the 2nd look [i.e. looking at TIME_WAIT sockets. Easy to identify due to the access to __tcp_ehash_size].

The entry in the hash table is an tcp_tw_bucket, and that structure is only ~88 bytes long. The oops is caused by an access to objp+0x168, which doesn't exist.







<Prev in Thread] Current Thread [Next in Thread>