Re: sk ethernet driver: watchdog timeout

From: Palle Girgensohn <girgen_at_pingpong.net>
Date: Sun, 11 Apr 2004 16:38:02 +0200
Hi again,

I just tried FreeBSD RELENG_5_2 on one of our boxes having sk0, and the 
demsg during boot is very noisy when trying to load driver for sk0... 
Perhaps these dmesg rows can help. It stops every so often on 5.2 as well, 
the interface is unusable. :(

Is there any point in trying any of the patches from this thread?

/Palle

skc0: <Marvell Gigabit Ethernet> port 0x9000-0x90ff mem 
0xe8000000-0xe8003fff irq 5 at device 4.0 on pci1
skc0: Yukon Gigabit Ethernet 10/100/1000Base-T Adapter
sk0: <Marvell Semiconductor, Inc. Yukon> on skc0
malloc() of "512" with the following non-sleepable locks held:
exclusive sleep mutex skc0 (network driver) r = 0 (0xc62311c0) locked _at_ 
/4/usr/5src/sys/pci/if_sk.c:1368
Stack backtrace:
backtrace(c094bc3c,c0c219a4,116,1,c08bf820) at backtrace+0x17
witness_warn(5,0,c087566d,c082122f,55) at witness_warn+0x193
uma_zalloc_arg(c1038e00,0,102,20,c0864b62) at uma_zalloc_arg+0xa9
malloc(110,c08bf820,102,0,180) at malloc+0xcf
if_attach(c6232000,c623200c,10,c0864b62,0) at if_attach+0x240
ether_ifattach(c6232000,c62321d4,0,0,ffffffff) at ether_ifattach+0x22
sk_attach(c6231080,c61ba84c,c0886510,c0c21ab0,c062b75f) at sk_attach+0x359
device_probe_and_attach(c6231080,c620b700,c0c21b04,c0a3d797,c620b180) at 
device_probe_and_attach+0xa9
bus_generic_attach(c620b180,11a,1,c0c21af0,ffffffff) at 
bus_generic_attach+0x19
skc_attach(c620b180,c61b984c,c0886510,0,e) at skc_attach+0x7d7
device_probe_and_attach(c620b180,1,c0c21b68,c05873f1,c620b700) at 
device_probe_and_attach+0xa9
bus_generic_attach(c620b700,1,78,c0c21b58,1) at bus_generic_attach+0x19
pci_attach(c620b700,c620b700,c0877468,1,c6161800) at pci_attach+0xa1
device_probe_and_attach(c620b700,c6161800,c0c21bbc,c0589356,c6161800) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6161800,c0877468,1,c6161800,c0c21be8) at 
bus_generic_attach+0x19
pcib_attach(c6161800,c61e704c,c0886510,0,c6161a00) at pcib_attach+0x46
device_probe_and_attach(c6161800,0,c0c21c20,c05873f1,c6207780) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6207780,0,78,c0c21c10,0) at bus_generic_attach+0x19
pci_attach(c6207780,c618684c,c0886510,0,0) at pci_attach+0xa1
device_probe_and_attach(c6207780,0,c0c21c84,c07f6ccd,c6207800) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6207800,c0877468,0,c0c21c74,0) at 
bus_generic_attach+0x19
legacy_pcib_attach(c6207800,c61e884c,c0886510,c2262ce0,c6207900) at 
legacy_pcib_attach+0x9d
device_probe_and_attach(c6207800,c6207900,c0c21ce0,c07e169b,c6207900) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6207900,c0c21ce0,c0654aeb,c08f7228,c6207900) at 
bus_generic_attach+0x19
legacy_attach(c6207900,c61d284c,c0886510,0,c08faec0) at legacy_attach+0x1b
device_probe_and_attach(c6207900,c6207a00,c0c21d2c,c07e98fc,c6207a00) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6207a00,c6207a00,c0c21d58,c0650489,c6207a00) at 
bus_generic_attach+0x19
nexus_attach(c6207a00,c61de04c,c0886510,0,c226ea40) at nexus_attach+0x1c
device_probe_and_attach(c6207a00,c226ea40,c0c21d7c,c07d7319,c2282a00) at 
device_probe_and_attach+0xa9
root_bus_configure(c2282a00,c087914c,0,c0c21d98,c060fb49) at 
root_bus_configure+0x1b
configure(0,c1ec00,c1e000,c1ec00,c1e000) at configure+0x29
mi_startup() at mi_startup+0x99
begin() at begin+0x2c
sk0: Ethernet address: 00:0e:a6:2b:d5:17
miibus1: <MII bus> on sk0
e1000phy0: <Marvell 88E1000 Gigabit PHY> on miibus1
lock order reversal
 1st 0xc62311c0 skc0 (network driver) _at_ /4/usr/5src/sys/pci/if_sk.c:672
 2nd 0xc091aaa0 kernel environment (kernel environment) _at_ 
/4/usr/5src/sys/kern/kern_environment.c:288
Stack backtrace:
backtrace(c085f834,c091aaa0,c0859523,c0859523,c08594fb) at backtrace+0x17
witness_checkorder(c091aaa0,1,c08594fb,120,c096d000) at 
witness_checkorder+0x6f6
_sx_slock(c091aaa0,c08594fb,120,c08f6700,a) at _sx_slock+0x8e
getenv(c0841e7e,0,c0651084,28,c6230f00) at getenv+0x3b
getenv_quad(c0841e7e,c0c21950,c6230f00,c0c2196c,c0c21980) at 
getenv_quad+0x1a
getenv_int(c0841e7e,c09124e8,c6230f00,c0c21980,c0654aeb) at getenv_int+0x18
e1000phy_attach(c6230f00,c61ea84c,c0886510,c06510d1,c086a855) at 
e1000phy_attach+0x1d
device_probe_and_attach(c6230f00,c6231000,c0c219dc,c0561149,c6231000) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6231000,f0000000,c0a3c610,c0a3c650,c6231000) at 
bus_generic_attach+0x19
miibus_attach(c6231000,c6231000,2b3,1,0) at miibus_attach+0x59
device_probe_and_attach(c6231000,0,c0c21a38,c056152a,c6231080) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6231080,0,1,0,c6232000) at bus_generic_attach+0x19
mii_phy_probe(c6231080,c62321e4,c0a3c610,c0a3c650,ffffffff) at 
mii_phy_probe+0x10a
sk_attach(c6231080,c61ba84c,c0886510,c0c21ab0,c062b75f) at sk_attach+0x3a2
device_probe_and_attach(c6231080,c620b700,c0c21b04,c0a3d797,c620b180) at 
device_probe_and_attach+0xa9
bus_generic_attach(c620b180,11a,1,c0c21af0,ffffffff) at 
bus_generic_attach+0x19
skc_attach(c620b180,c61b984c,c0886510,0,e) at skc_attach+0x7d7
device_probe_and_attach(c620b180,1,c0c21b68,c05873f1,c620b700) at 
device_probe_and_attach+0xa9
bus_generic_attach(c620b700,1,78,c0c21b58,1) at bus_generic_attach+0x19
pci_attach(c620b700,c620b700,c0877468,1,c6161800) at pci_attach+0xa1
device_probe_and_attach(c620b700,c6161800,c0c21bbc,c0589356,c6161800) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6161800,c0877468,1,c6161800,c0c21be8) at 
bus_generic_attach+0x19
pcib_attach(c6161800,c61e704c,c0886510,0,c6161a00) at pcib_attach+0x46
device_probe_and_attach(c6161800,0,c0c21c20,c05873f1,c6207780) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6207780,0,78,c0c21c10,0) at bus_generic_attach+0x19
pci_attach(c6207780,c618684c,c0886510,0,0) at pci_attach+0xa1
device_probe_and_attach(c6207780,0,c0c21c84,c07f6ccd,c6207800) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6207800,c0877468,0,c0c21c74,0) at 
bus_generic_attach+0x19
legacy_pcib_attach(c6207800,c61e884c,c0886510,c2262ce0,c6207900) at 
legacy_pcib_attach+0x9d
device_probe_and_attach(c6207800,c6207900,c0c21ce0,c07e169b,c6207900) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6207900,c0c21ce0,c0654aeb,c08f7228,c6207900) at 
bus_generic_attach+0x19
legacy_attach(c6207900,c61d284c,c0886510,0,c08faec0) at legacy_attach+0x1b
device_probe_and_attach(c6207900,c6207a00,c0c21d2c,c07e98fc,c6207a00) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6207a00,c6207a00,c0c21d58,c0650489,c6207a00) at 
bus_generic_attach+0x19
nexus_attach(c6207a00,c61de04c,c0886510,0,c226ea40) at nexus_attach+0x1c
device_probe_and_attach(c6207a00,c226ea40,c0c21d7c,c07d7319,c2282a00) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6207a00,c6207a00,c0c21d58,c0650489,c6207a00) at 
bus_generic_attach+0x19
nexus_attach(c6207a00,c61de04c,c0886510,0,c226ea40) at nexus_attach+0x1c
device_probe_and_attach(c6207a00,c226ea40,c0c21d7c,c07d7319,c2282a00) at 
device_probe_and_attach+0xa9
root_bus_configure(c2282a00,c087914c,0,c0c21d98,c060fb49) at 
root_bus_configure+0x1b
configure(0,c1ec00,c1e000,c1ec00,c1e000) at configure+0x29
mi_startup() at mi_startup+0x99
begin() at begin+0x2c
sk0: Ethernet address: 00:0e:a6:2b:d5:17
miibus1: <MII bus> on sk0
e1000phy0: <Marvell 88E1000 Gigabit PHY> on miibus1
lock order reversal
 1st 0xc62311c0 skc0 (network driver) _at_ /4/usr/5src/sys/pci/if_sk.c:672
 2nd 0xc091aaa0 kernel environment (kernel environment) _at_ 
/4/usr/5src/sys/kern/kern_environment.c:288
Stack backtrace:
backtrace(c085f834,c091aaa0,c0859523,c0859523,c08594fb) at backtrace+0x17
witness_checkorder(c091aaa0,1,c08594fb,120,c096d000) at 
witness_checkorder+0x6f6
_sx_slock(c091aaa0,c08594fb,120,c08f6700,a) at _sx_slock+0x8e
getenv(c0841e7e,0,c0651084,28,c6230f00) at getenv+0x3b
getenv_quad(c0841e7e,c0c21950,c6230f00,c0c2196c,c0c21980) at 
getenv_quad+0x1a
getenv_int(c0841e7e,c09124e8,c6230f00,c0c21980,c0654aeb) at getenv_int+0x18
e1000phy_attach(c6230f00,c61ea84c,c0886510,c06510d1,c086a855) at 
e1000phy_attach+0x1d
device_probe_and_attach(c6230f00,c6231000,c0c219dc,c0561149,c6231000) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6231000,f0000000,c0a3c610,c0a3c650,c6231000) at 
bus_generic_attach+0x19
miibus_attach(c6231000,c6231000,2b3,1,0) at miibus_attach+0x59
device_probe_and_attach(c6231000,0,c0c21a38,c056152a,c6231080) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6231080,0,1,0,c6232000) at bus_generic_attach+0x19
mii_phy_probe(c6231080,c62321e4,c0a3c610,c0a3c650,ffffffff) at 
mii_phy_probe+0x10a
sk_attach(c6231080,c61ba84c,c0886510,c0c21ab0,c062b75f) at sk_attach+0x3a2
device_probe_and_attach(c6231080,c620b700,c0c21b04,c0a3d797,c620b180) at 
device_probe_and_attach+0xa9
bus_generic_attach(c620b180,11a,1,c0c21af0,ffffffff) at 
bus_generic_attach+0x19
skc_attach(c620b180,c61b984c,c0886510,0,e) at skc_attach+0x7d7
device_probe_and_attach(c620b180,1,c0c21b68,c05873f1,c620b700) at 
device_probe_and_attach+0xa9
bus_generic_attach(c620b700,1,78,c0c21b58,1) at bus_generic_attach+0x19
pci_attach(c620b700,c620b700,c0877468,1,c6161800) at pci_attach+0xa1
device_probe_and_attach(c620b700,c6161800,c0c21bbc,c0589356,c6161800) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6161800,c0877468,1,c6161800,c0c21be8) at 
bus_generic_attach+0x19
pcib_attach(c6161800,c61e704c,c0886510,0,c6161a00) at pcib_attach+0x46
device_probe_and_attach(c6161800,0,c0c21c20,c05873f1,c6207780) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6207780,0,78,c0c21c10,0) at bus_generic_attach+0x19
pci_attach(c6207780,c618684c,c0886510,0,0) at pci_attach+0xa1
device_probe_and_attach(c6207780,0,c0c21c84,c07f6ccd,c6207800) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6207800,c0877468,0,c0c21c74,0) at 
bus_generic_attach+0x19
legacy_pcib_attach(c6207800,c61e884c,c0886510,c2262ce0,c6207900) at 
legacy_pcib_attach+0x9d
device_probe_and_attach(c6207800,c6207900,c0c21ce0,c07e169b,c6207900) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6207900,c0c21ce0,c0654aeb,c08f7228,c6207900) at 
bus_generic_attach+0x19
legacy_attach(c6207900,c61d284c,c0886510,0,c08faec0) at legacy_attach+0x1b
device_probe_and_attach(c6207900,c6207a00,c0c21d2c,c07e98fc,c6207a00) at 
device_probe_and_attach+0xa9
bus_generic_attach(c6207a00,c6207a00,c0c21d58,c0650489,c6207a00) at 
bus_generic_attach+0x19
nexus_attach(c6207a00,c61de04c,c0886510,0,c226ea40) at nexus_attach+0x1c
device_probe_and_attach(c6207a00,c226ea40,c0c21d7c,c07d7319,c2282a00) at 
device_probe_and_attach+0xa9
root_bus_configure(c2282a00,c087914c,0,c0c21d98,c060fb49) at 
root_bus_configure+0x1b
configure(0,c1ec00,c1e000,c1ec00,c1e000) at configure+0x29
mi_startup() at mi_startup+0x99
begin() at begin+0x2c
e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, 
auto
skc0: [GIANT-LOCKED]


--On torsdag, april 08, 2004 22.36.18 +0300 Ruslan Ermilov <ru_at_FreeBSD.org> 
wrote:

> On Thu, Apr 08, 2004 at 12:17:06AM +1000, Bruce Evans wrote:
> [...]
>> The following patch reduces the problem on A7V8X-E a little.  It limits
>> the tx queue to 1 packet and fixes handling of the timeout on txeof.
>> The first part probably makes the second part a no-op.  Without this,
>> my A7V8X-E hangs on even light nfs activity (e.g., copying a 1MB file
>> to nfs).  With it, it takes heavier nfs activity to hang (makeworld
>> never completes, and a flood ping always hangs).
>>
>> I first suspected an interrupt-related bug, but the bug seems to be
>> more hardware-specific.  Examination of the output queues shows that
>> the tx sometimes just stops before processing all packets.  Resetting
>> in sk_watchdog() doesn't always fix the problem, and the timeout usually
>> stops firing after a couple of unsuccessful resets, giving a completely
>> hung device.  But the problem may be related to interrupt timing, since
>> it is much smaller under RELENG_4.  RELENG_4 hangs about as often
>> without this hack as -current does with it.
>>
>> nv0 hangs similarly.  fxp0 just works.
>>
>> %%%
>> Index: if_sk.c
>> ===================================================================
>> RCS file: /home/ncvs/src/sys/pci/if_sk.c,v
>> retrieving revision 1.78
>> diff -u -2 -r1.78 if_sk.c
>> --- if_sk.c	31 Mar 2004 12:35:51 -0000	1.78
>> +++ if_sk.c	1 Apr 2004 07:33:58 -0000
>> _at__at_ -1830,4 +1830,9 _at__at_
>>  	SK_IF_LOCK(sc_if);
>>
>> +	if (sc_if->sk_cdata.sk_tx_cnt > 0) {
>> +		SK_IF_UNLOCK(sc_if);
>> +		return;
>> +	}
>> +
>>  	idx = sc_if->sk_cdata.sk_tx_prod;
>>
>> _at__at_ -1853,4 +1858,5 _at__at_
>>  		 */
>>  		BPF_MTAP(ifp, m_head);
>> +		break;
>>  	}
>>
>> _at__at_ -2000,5 +2031,4 _at__at_
>>  		sc_if->sk_cdata.sk_tx_cnt--;
>>  		SK_INC(idx, SK_TX_RING_CNT);
>> -		ifp->if_timer = 0;
>>  	}
>>
>> _at__at_ -2007,4 +2037,6 _at__at_
>>  	if (cur_tx != NULL)
>>  		ifp->if_flags &= ~IFF_OACTIVE;
>> +
>> +	ifp->if_timer = (sc_if->sk_cdata.sk_tx_cnt == 0) ? 0 : 5;
>>
>>  	return;
>> %%%
>>
> Always recharging the timer to 5 when there's some TX work still
> left is a bug.  With DEVICE_POLLING (yes, I have plans to add
> polling(4) support for sk(4) too), sk_txeof() will be called
> periodically, and if the card gets stuck, the if_timer will
> never downgrade to zero, and sk_watchdog() will never be called.
> Without DEVICE_POLLING, recharging it back to 5 even when
> if_timer reaches 0 is still pointless, because when if_timer is
> 0 while in the sk_txeof(), it means it's called by sk_watchdog()
> which will reinit the card and both RX and TX lists, making them
> empty, so having the if_timer with the value of 5 _after_
> executing the watchdog cleaning and having _no_ TX activity at
> all may cause a second (false) watchdog.  My version of the
> TX fixes (which also fixes resetting of IFF_OACTIVE):
>
> %%%
> Index: if_sk.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/pci/if_sk.c,v
> retrieving revision 1.78
> diff -u -p -r1.78 if_sk.c
> --- if_sk.c	31 Mar 2004 12:35:51 -0000	1.78
> +++ if_sk.c	8 Apr 2004 19:10:50 -0000
> _at__at_ -1998,14 +1998,14 _at__at_ sk_txeof(sc_if)
>  			sc_if->sk_cdata.sk_tx_chain[idx].sk_mbuf = NULL;
>  		}
>  		sc_if->sk_cdata.sk_tx_cnt--;
> +		ifp->if_flags &= ~IFF_OACTIVE;
>  		SK_INC(idx, SK_TX_RING_CNT);
> -		ifp->if_timer = 0;
>  	}
>
>  	sc_if->sk_cdata.sk_tx_cons = idx;
>
> -	if (cur_tx != NULL)
> -		ifp->if_flags &= ~IFF_OACTIVE;
> +	if (sc_if->sk_cdata.sk_tx_cnt == 0)
> +		ifp->if_timer = 0;
>
>  	return;
>  }
> %%%
>
> We have been running the 3COM 3C940 card on 4.9 (and from today
> on 4.10-BETA) without any problems and under a heavy TX load.
>
>
> Cheers,
> --
> Ruslan Ermilov
> ru_at_FreeBSD.org
> FreeBSD committer
Received on Sun Apr 11 2004 - 05:46:31 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:50 UTC