Re: sio / puc wedging on both -current and -stable

From: Bruce Evans <bde_at_zeta.org.au>
Date: Tue, 18 May 2004 13:50:02 +1000 (EST)
On Mon, 17 May 2004, Mike Tancsa wrote:

> We are building a box that needs many serial ports to talk to some legacy
> low speed (9600) serial devices.  Our application (a small daemon written
> in c) happily talks to the devices and all works well.  However, if one of
> the external devices die or is unplugged, the FreeBSD box will at seemingly
> irregular intervals lockup hard.  The only way to unlock the machine is to
> either hit the reset button (the keyboard is locked solid-- not even num
> lock works) *or* if I jiggle the DB9 connector enough so that enough noise
> shorts across the serial port *or* plug the serial port into a working
> device that I imagine sends some data on the serial port.  The machine then
> returns to a normal state and all is well. This does NOT happen with the
> onboard serial ports.  Only with a PUC device (we have tried several and
> its the same result)
>
> Does this jog anyone's memory as to what the problem might be ?

It's an interrupt storm of some sort.  PCI interrupts are more likely to
cause one than ISA interrupts because they are more likely to be level
triggered.

> I have a remote debugger setup and I can send a break and drop the unit
> into debugger, but kernel debugging is a little beyond our skillset.

Does this break into the locked machine?  If so...

> db> trace
> siointr1(c11d0000,d56dacb0,c02b49e6,c11d0000,10) at siointr1+0xc5
> siointr(c11d0000,10,a005,c,10060) at siointr+0xc
> Xfastintr4(c11d0c00,d56dacd8,c02a741a,c11d0c00,c0a3f240) at Xfastintr4+0x16
> siointr(c11d0c00) at siointr+0xc

... Type "s", then hold down the Enter key to repeat the "s" command until
control returns here, then keep holding down the Enter key until something
loops (may take many hundreds of commands).  Record all the output using
a serial console (don't type it in) and send it to me.

> puc_intr(c11af000,63103a,c11d0c00,0,d56dad68) at puc_intr+0x4e

If control returns here, then siointr hasn't looped internally; keep
going.

> intr_mux(c0a3f240,0,630010,c1360010,c0170010) at intr_mux+0x1f

If control returns here, then the loop is external so it is harder to
debug  (but this is the most likely case).

Going through intr_mux() means that the interrupt is not fast
(options PUC_FASTINTR).  Try that.

> Xresume12() at Xresume12+0x2b

Stop if it gets back here.

> --- interrupt, eip = 0xc02b5b2a, esp = 0xd56dad38, ebp = 0xd56dad68 ---
> vec12(c11ce980,3,2000,cbf03a00,d56634c0) at vec12+0x2
> cnopen(c11ce980,3,2000,cbf03a00,0) at cnopen+0x6a

It may be significant that the hang seems to occur while openig the console
device.  Do you have a serial console on the puc device?  I thought that
this doesn't work.

> Any pointers on how to track this down ?  It happens both in RELENG_4 from
> May 12th and 5.2-CURRENT FreeBSD 5.2-CURRENT #1: Thu May 13

Did it work before then?  The driver hasn't changed since long before then.

Bruce
Received on Mon May 17 2004 - 18:50:12 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:54 UTC