Re: hang in rpccon from interrupting NFS operations (Re: pointyhat panic)

From: Rick Macklem <rmacklem_at_uoguelph.ca>
Date: Thu, 25 Mar 2010 10:16:53 -0400 (EDT)
On Mon, 22 Mar 2010, Adrenalin wrote:

> That's strange, after recompiling the lastest 8_0 that contain the patch (
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/rpc/clnt_vc.c.diff?r1=1.8.2.2.2.1;r2=1.8.2.2.2.2)
> after 5 days it stuck again with same symptoms, I've also got some in the
> nfs state:
>
> FreeBSD .. 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Mar 16 22:56:51 EET
> 2010     .._at_..:/usr/obj/usr/src/sys/MYGEN  amd64
>
> When attaching the debugger for an rpccon process, It stuck in here
> #0  0x000000080124051c in stat () from /lib/libc.so.7
>
> http://img705.imageshack.us/img705/741/10032219218.png
>
> Can I do the online debug of the kernel, or how can I can help you to solve
> the problem ?
>
Well, sleeping in "rpccon" means that the TCP connect has failed after a
soconnect() call. If you can get into a kernel debugger, there is a
global structure with more error information in it.
It is called: rpc_createerr
- and it has 2 enums, followed by an int. The first enum should be 12
   (RPC_SYSTEMERR), which is what gets it to tsleep(.."rpccon"..), the
   second enum doesn't apply to this case and the int after them should
   be the errno of the soconnect() failure. (The way the code is currently
   written, it could either be an error return from soconnect() or a value
   set in so_error after soconnect() returns, while it is in the process
   of connecting.

So, if you can get to that 3rd field, the value there might help tell
why the TCP connect is failing. Otherwise, all I can suggest is poking
around and trying to figure out why TCP connects are failing.
- wedged network interface
- routing problem
- network infrastructure problem
...
(Btw, I was driven a little batty at UofG because the campus network
  switch I was on would decide to inject TCP RSTs into new connection
  attempts for some reason. I finally was able to determine this by
  looking at packet traces on both client and server and see the RSTs
  coming out of the network on the client end, but never sent on the
  server end. It was some Cisco related parameter/issue that was never
  resolved.)

Hopefully others with more TCP expertise can make suggestions w.r.t.
why the TCP connects are failing?

Good luck with it, rick
Received on Thu Mar 25 2010 - 13:03:52 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:02 UTC