AMD non-blocking RPC problem now reproducable

From: Martin Blapp <mb_at_imp.ch>
Date: Thu, 15 May 2003 11:02:20 +0200 (CEST)
Hi all,

As already told, we still encounter a AMD problem in
pre 5.1. With help of Genesys of #bsdcode I could
reproduce it here.

I'm now able to reproduce it, but debugging is quite
difficult ! It's not specific to linux clients. A FreeBSD
client suffers too.

Make at least two fs exported. Here in my example we use / and /usr

On client, do the following:

- Start amd
- Run this loop:

while true ; do amq -u /net/yourserver ; sleep 1 ; ls -ld \
/net/yourserver/usr/local || break ; done

It is important that you list the imput of a subdir of use, because
the first call seems to succeed always. It's the second one which fails.

You will see output like:

drwxr-xr-x   11 root     root          512 May  5 14:11 /net/yourserver/usr/local

It will fail after 2-150 successful trys. If the blocking case (old behaviour)
is used within the mountd server, whis will not happen.

Even more strange. If I attach a ktrace on the pid of mountd, the bug
appears always ! I'm not sure if we trigger the same bug then, but it
appears to me that we do.

And I begin to suspect that it's timing related. The faster the network
response, the less we hit this bug.

This is a ktrace on the server ...

 86984 mountd   RET   read 4
 86984 mountd   CALL  gettimeofday(0x80589c0,0)
 86984 mountd   RET   gettimeofday 0
 86984 mountd   CALL  read(0x8,0x807a000,0x74)
 86984 mountd   GIO   fd 8 read 116 bytes

"~wG\^W\0\0\0\0\0\0\0\^B\0\^A\M^F\M-%\0\0\0\^C\0\0\0\^A\0\0\0\^A\0\0\0D>\M-COo\0\0\0\rlevais.imp.ch\0\0\0\0\0\0\0\

\0\0\0\0\0\0\0\b\0\0\0\0\0\0\0\0\0\0\0\^B\0\0\0\^C\0\0\0\^D\0\0\0\^E\0\0\0\^T\0\0\0\^_\0\0\0\0\0\0\0\0\0\0\0\^D/u\
        sr"
 86984 mountd   RET   read 116/0x74
 86984 mountd   CALL  gettimeofday(0x80589c0,0)
 86984 mountd   RET   gettimeofday 0
 86984 mountd   CALL  read(0x8,0x80545c8,0x4)
 86984 mountd   RET   read -1 errno 35 Resource temporarily unavailable
 86984 mountd   CALL  close(0x8)
 86984 mountd   RET   close 0
 86984 mountd   CALL  select(0x8,0xbfbffb98,0,0,0)

EAGAIN is ok, since we use non-blocking RPC. But something goes wrong then
and the connection get's closed. Of course additional requests will fail
then from client side then.

May 15 10:27:31 myclient amd[38168]: mountd rpc failed: RPC: Unable to receive

Martin

Martin Blapp, <mb_at_imp.ch> <mbr_at_FreeBSD.org>
------------------------------------------------------------------
ImproWare AG, UNIXSP & ISP, Zurlindenstrasse 29, 4133 Pratteln, CH
Phone: +41 61 826 93 00 Fax: +41 61 826 93 01
PGP: <finger -l mbr_at_freebsd.org>
PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E
------------------------------------------------------------------
Received on Thu May 15 2003 - 00:02:26 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:08 UTC