On Mon, 3 Nov 2003, John Baldwin wrote: > On 01-Nov-2003 Soren Schmidt wrote: > > It seems Sean Chittenden wrote: > >> Howdy. I'm not sure if this is a ULE bug or a KSE bug, or both, but, > >> for those interested (this is using ule 1.67, rebuilding world now), > >> here's my stack. I couldn't figure out where td was being set to > >> NULL. :( Oh! Where is TD_SET_LOCK defined? egrep -r didn't turn up > >> anything. -sc > > > > Its not ULE, I'm running 4BSD and has gotten this on boot for over a > > week now, rendering -current totally useless... > > Having a kernel panic with INVARIANTS on would really help narrow down > where the bug is. I found something that causes this bug fairly reliably: - configure ddb so that db_print_backtrace() is called on panics. - break the fd driver so that the panic() in fdstrategy() is called on floppy accesses. - attempt to access a floppy so that fdstrategy() is called. - db_print_backtrace() then does bad things. It never completes here, though it works in other contexts. Usually it prints only the first line or two. Then quite often ddb is called for a null pointer panic in propagate_priority(). More details about the null pointer panic: This seems to have nothing to do with scheduling. propagate_priority() is not called with a null td of course, but it sometimes follows a null m: %%% /* * Pick up the mutex that td is blocked on. */ m = td->td_blocked; MPASS(m != NULL); /* * Check if the thread needs to be moved up on * the blocked chain */ if (td == TAILQ_FIRST(&m->mtx_blocked)) { continue; } %%% I don't have invariants enabled, so MPASS(m != NULL) doesn't do anything, but m is null so attempting to load m->mtx_blocked causes a panic. For the backtrace context, propagate_priority() gets called for attempting to aquire a lock in softclock(). Tasks like the softclock task get scheduled despite the system being in panic(). ps seemed to show that the user process doing the floppy access no longer existed. I don't know how that could happen, since the panic() is done in the context of the that process. More details about bugs in db_print_backtrace(): Maybe the stack is messed up. Attempting to access invalid stack offsets can cause problems. My version of db_print_backtrace() has extra code to attempt not to access invalid offsets, but there is normally no problem since ddb's trap handler fixes up the problem. But backtrace() bogusly calls db_print_backtrace() in non-ddb context and then the longjmp in the trap handler goes to hyperspace if anywhere. Bugs tripped over while debugging this: Putting a breakpoint in fdopen() didn't work, because fd.c:fdopen() conflicts with kern_descrip.c:fdopen(). This was broken in fd.c 1.259. There are hundreds of similar conflicts in GENERIC, some for obviously broken things like the same malloc type being static in several files. BruceReceived on Mon Nov 03 2003 - 23:51:10 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:27 UTC