Re: T40 panics at usb_get_next_event() when ACPI is disabled

From: Tai-hwa Liang <avatar_at_mmlab.cse.yzu.edu.tw> Date: Tue, 8 Jun 2004 10:42:37 +0800 (CST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:56 UTC

On Mon, 7 Jun 2004, Brian Buchanan wrote:
> Yes, I see this too on my T40p, but only when booting with the mouse
> plugged into the laptop through a USB hub connected to the docking
> station.  If the mouse is plugged in directly to the laptop (I haven't
> tried plugging the USB hub directly into the laptop) or not plugged in,

The problem always occurs on my T40 when the USB mouse is directly plugged
into the laptop.

> the problem does not occur.  My hypothesis is that because a certain
> event list entry is being overwritten, the USB event list only grows long
> enough to use this area of memory in this configuration.

Interesting hypothesis. What really bothers me is that the extra
"if (ueq != NULL)" checks didn't catch the NULL ueq case. According to the
backtrace, it crashed at "*ue = ueq->ue," where ueq is NULL at that moment.

> I wrote a function to perform a sanity check on the event list and
> determined that the list is not corrupt after all the USB boot-time events
> have been queued.  The list becomes corrupted some time between then and

I'm curious about the sanity check function you've written. Would you mind
to post it?

> when usbd attempts to read the event queue.  One of the events, the same
> one every time, is overwritten with something like 0x01000010 (I don't
> have a log of the actual bit pattern).  Since it's happening to the same
> object every time, the next step would be to set a watch point in the
> debugger.  I'll probably give this a try once I have a chance to consult
> with someone who knows more about kernel debugging.

Did you try to extract the backtrace from the core file? It helps for
further analysis:

	http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html

Or you'd like to use DDB to do the online kernel debugging:

	http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-online-ddb.html

> I did experiment with rolling back some usb commits, but it does not
> appear that a change to the usb subsystem is what caused this breakage.  I
> think something else in the system is misbehaving and overwriting memory.

Perhaps, since the enqueuing/dequeuing of usb_event supposed to be protected
by splusb(), there shouldn't be race here unless there's something wrong in
the interrupt priority(shared with splnet/splnet/splbio?) settings.