Hi, On 2 May 2015 at 00:02, Poul-Henning Kamp <phk_at_phk.freebsd.dk> wrote: > May 2 01:01:34 critter kernel: iwn0: device timeout > May 2 01:01:34 critter kernel: firmware: 'iwn6000g2afw' version 0: 677296 bytes loaded at 0xffffffff81f880c0 > May 2 01:01:34 critter kernel: iwn0: iwn_read_firmware: ucode rev=0x12a80601 > May 2 01:01:40 critter kernel: iwn0: iwn_tx_data: m=0xfffff80236fe8500: seqno (9550) (78) != ring index (0) ! > May 2 01:01:40 critter kernel: iwn0: iwn_intr: fatal firmware error > May 2 01:01:40 critter kernel: iwn0: iwn_panicked: controller panicked, iv_state = 5; resetting... > May 2 01:01:40 critter kernel: firmware: 'iwn6000g2afw' version 0: 677296 bytes loaded at 0xffffffff81f880c0 > May 2 01:01:40 critter kernel: iwn0: iwn_read_firmware: ucode rev=0x12a80601 > > And then the machine hung. > > No further details, as the screen-blanker was on. So there's something odd with iwn and sequence number allocations. what's supposed to happen here is that: * net80211 handles sequence number allocation; * then A-MPDU is negotiated; * then the driver handles sequence number allocations. The firmware requires that for 11n transmit, each frame goes into a ring slot that's seqno % 256. It's not an arbitrary slot. It'll panic otherwise, like you saw above. Now, something's upsetting it. It may be a noisy environment leading to BAR frame transmissions and eventual tear-down of the A-MPDU state, leading to net80211 taking over sequence number allocation again. I fixed a whole of those races in the ath(4) driver when I implemented 11n and found there's no locking at all going on there. :( It could also be something inside net80211 that's advancing the sequence number space, even though A-MPDU is enabled. There's only a couple of places where ni_txseqs is updated in net80211. If it were getting updated there, it should be obvious. But it does do a check to see if AMPDU is enabled and running, and none of that is consistently locked. iwn_addba_response() sets the ni_txseq for the tid to be whatever was negotiated during the aggregation negotiation (ADDBA) and then sets the initial ring slot id to be whatever the starting sequence number is ('ssn' in *_ampdu_tx_start()). iwn_tx_data() does do sequence number allocation there. It's possible we're seeing races where aggregation is being torn down during active transmit and the state is all mucked up. I recall seeing issues in ath(4) where there were some packets queued between sending out the initial aggregation negotiation and it being negotiated, which meant some packets would go out with sequence numbers /after/ what was initially negotatied during ADDBA. Ie: * you're at seq X, and you negotiate ADDBA at seq X; * you queue a bunch of transmit frames, seq X -> X + n; * peer says "ADDBA acceptable, starting seq X"; * the next frame you transmit comes from seq X + n + 1, but the other peer is confused. Here it may show up as: * you negotiate seq X via addba; * you queue a bunch more frames via the normal transmit path; * you get the addba response, set initial ssn to X; * the 'cur' pointer here in the ring is now X % 256, but the next frame you transmit is (X + n) % 256, and stuff is out of alignment. So, would someone please help see if that's the case? That'd be really helpful. :) -adrianReceived on Sat May 02 2015 - 07:03:53 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:57 UTC