On Tue, 14 Sep 2004, Volker wrote: > After the reboot, the system is panicing 3 to 8 times a day. To see the > panic messages, I've set the PANIC_REBOOT_WAIT_TIME to -1 and this let > me see a message like (not copied and pasted): If I might suggest, and if possible, you might want to set up a serial console for the box so that you can copy and paste debugger output. You'll probably be asked for quite a bit of output from the debugger and life is a lot easier if you can do that :-). It also reduces the chances of typographical errors. > fatal trap 12: page fault > fault virtual address: 0xc > fault code: supervisor read, page not present > instr. ptr: 0x8:0xc0586e60 > stack ptr: 0x10:0xcee2cac8 > frame ptr: 0x10:0xcee2caf0 > cs: base 0x0 limit 0xffff type 0x1b DPL 0 pres 1 def32 1 gran 1 > cpu eflags: interrupt enabled, resume, IOPL=0 > process: 33767 imapd > trap 12 This is a kernel NULL pointer dereference. To debug this, it would be helpful if you could determine what line in the kernel source code 0xc0586e60 refers to. addr2line on the kernel.debug from your kernel build is a good place to start. It would also be very helpful to have a stack trace. When you drop to DDB due to the panic (assuming DDB is compiled in), you can type in "trace" to generate the trace. Having the names of the functions plus offsets would be very helpful. Also having the arguments is good, but a lot more pain for you without a serial console :-). > While trying to get the system stable, I've tried a 6-current Kernel > (+world) but the system still panics (only the current process and the > pointer addresses are changing, the system mostly panics with a trap > 12). > > Another time the system panic'ed with: 'panic: sbappendaddr_locked' A stack trace here would be invaluable. This panic occurs as a result of a violation of calling convention, in which a non-header mbuf (or maybe a free'd mbuf) is appended to a socket incorrectly. A stack trace will tell as what calling code might be at fauilt. > On 2004-09-13 I've cvsup'ed current and releng_5 sources and recompiled > (releng_5) world + kernel. The system kept panicing. > > Well, since having boot problems using that mainboard (Slot-1, P-III > 600, FIC VB-601V, which caused the BTX loader sometimes to a fatal > exit... strange thing), I've plugged in another board which has been > working stable over the last few weeks (Epox 51-MVP3G with AMD K6-2 500). > > This system is now up using that socket-7 board but has paniced a few > minutes ago the second time: > > fatal trap 12: page fault > fatal virtual address: 0x40 > trap 12: page fault while in kernel mode > ip: 0x8:0xc05488ed > sp: 0x10:0xca3f4c20 > fp: 0x10:0xca3f4c20 > process: 34 (swi6: task queue) > > A few minutes before it paniced with: > > in_cksum_skip: out of data by 184 A couple of bugs relating to this error were introduced and then fixed. In particular, could you confirm that you have at least revision 1.165 of udp_usrreq.c, or 1.162.2.2 of udp_usrreq.c? The merge to RELENG_5 happened on 8/30 so you should have it, but it's worth confirming. A stack trace here would also be extremely helpful, but this failure could be explained by whatever causes the sbappendaddr_locked failure as well. > Any additional tests you want me to drive? Could you try booting and running the system with debug.mpsafenet=0 in loader.conf? Is this an SMP box? Could you try compiling and running without the PREEMPTION kernel option? Probably the most valuable information would be the stack traces as indicated above, however. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert_at_fledge.watson.org Principal Research Scientist, McAfee ResearchReceived on Tue Sep 14 2004 - 15:00:49 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:11 UTC