Am 29.06.2011 10:03, schrieb Adrian Chadd: > On 29 June 2011 14:03, Bernhard Schmidt <bschmidt_at_freebsd.org> wrote: >> It's name is ieee80211_tx_mgt_timeout used to track AUTH/ASSOC >> requests. Afaik there is even a similar PR about that. Sorry, I manually entered the panic message, since dumps were not working on my system at the time of that panic. >> Adrian, you've got a AP set up to drop either a AUTH or ASSOC >> response frame? I've got a number of AUTH -> SCAN transition lost messages for wlan0, seconds to minutes apart: Jun 28 21:16:17 kernel: wlan0: ieee80211_new_state_locked: pending AUTH -> SCAN transition lost Jun 28 21:34:46 kernel: wlan0: ieee80211_new_state_locked: pending AUTH -> SCAN transition lost Jun 28 21:36:33 kernel: wlan0: ieee80211_new_state_locked: pending AUTH -> SCAN transition lost Jun 28 21:45:14 kernel: wlan0: ieee80211_new_state_locked: pending AUTH -> SCAN transition lost Jun 28 21:45:44 kernel: wlan0: ieee80211_new_state_locked: pending AUTH -> SCAN transition lost The setup is easy to reproduce, my rc.conf contained: wlans_ath0="wlan0" ifconfig_ath0="down" ifconfig_wlan0="down" wpa_supplicant_enable="YES" This system used to be connected via ath0, but recently was moved to a place where Ethernet is available. The panics started only after WLAN was not used anymore. I might disable wpa_supplicant, since it is not required in the current situation, but did not try whether that helps prevent the panic. > Tell me how and I'll set it up. > > A panic at that point in the function indicates maybe ni is NULL? > or ni->vap is now NULL, maybe? I recreated the panic, this time with kernel dumps correctly configured (thanks for the hint, Scott). The panic message is: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xffffff809c7a1000 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff805e1851 stack pointer = 0x28:0xffffff8000288ab0 frame pointer = 0x28:0xffffff8000288b60 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 11 (swi4: clock) Traceback: #10 0xffffffff805e1851 in ieee80211_tx_mgt_timeout (arg=0xffffff809c7a1000) at ../../../net80211/ieee80211_output.c:2487 This indicates, that an invalid argument is passed and assigned to "*ni", which causes the page fault when dereferencing "ni" to obtain "*va". I'm afraid that the assumption in the comment (about timeout being save to use) does not really hold: static void ieee80211_tx_mgt_timeout(void *arg) { struct ieee80211_node *ni = arg; struct ieee80211vap *vap = ni->ni_vap; if (vap->iv_state != IEEE80211_S_INIT && (vap->iv_ic->ic_flags & IEEE80211_F_SCAN) == 0) { /* * NB: it's safe to specify a timeout as the reason here; * it'll only be used in the right state. */ ieee80211_new_state(vap, IEEE80211_S_SCAN, IEEE80211_SCAN_FAIL_TIMEOUT)*vap ; } } If "vap" is valid during one invocation of that function, I'd expect it to at least be a pointer to valid kernel memory after the timeout. I.e., the value found by dereferencing it may be stale, but the pointer itself should at least not cause a page fault. (???) The compressed core.txt is 27KB, the compressed vmcore is 800MB. I might be able to find a place to upload the vmcore file to, but since I'm currently on a DSL with only 672KBit/s upstream, it would take me some 3 hours to upload to a better connected server (and I'd like to avoid doing that, if not essential for debugging). The core.txt is small enough to send by mail. Let me know if you think it helps you understand the problem. I'm willing to support debugging, e.g. by placement of printfs in my kernel for the timeout handler and the creation and destruction of *vap structures. After removal of "wlans_ath0=wlan0" the system will most probably be stable, I did not specifically test this case (i.e. ath0 configured, but no wlan0 created). I do know, that an "ifconfig down" of ath0 and wlan0 suffices; probably an "ifconfig wlan0 down" alone would be enough. So, I know how to avoid the panic, but I think it is still important to find the cause. Thank you for looking into this! Best regards, STefanReceived on Wed Jun 29 2011 - 07:07:38 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:15 UTC