I think we have been bouncing around this issue for the past few months both on RELENG_4 and now RELENG_5. In the past it has been somewhat difficult to reproduce, but now we can do it reliably. I dont think its a hardware issue as I can take the exact same 2 boxes with the exact same IRQ assignments and boot with OpenBSD and not run into an interrupt storm or freeze up the box. Swap back the RELENG_4 or RELENG_5 HD and again, I can produce an interrupt storm at will. I can also reproduce it on 2 different chipsets as well (VIA and Intel). The problem seems to be around how a PUC device (either a PCI modem or a PCI serial card) and the sharing of an interrupt (usually an USB controller, but not always). On RELENG_4, the box just locks up in a race trying to service an interrupt on IRQ 12 but remains unhandled. On RELENG_5, I actually catch an interrupt storm. e.g. I attach to sio4 (PUC modem) and Interrupt storm detected on "irq12: uhci1"; throttling interrupt source Looking at vmstat -i does indeed show a the rate getting throttled releng-5-pioneer# vmstat -i interrupt total rate irq0: clk 596719 99 irq1: atkbd0 2 0 irq4: sio0 1079 0 irq6: fdc0 1 0 irq8: rtc 763812 127 irq12: uhci1 5825 0 irq13: npx0 1 0 irq14: ata0 38727 6 irq15: vr0 ata1 1984 0 Total 1408150 235 releng-5-pioneer# where irq12 is the IRQ shared by the modem and the USB port. However, because all IRQ 12s get throttled, the modem is unusable. e.g. trying to cu -l /dev/cuaa4 and typing atz takes about 5 seconds. Is there some way to safely tell the kernel that the PUC device that its shareable ? We did this perhaps very ugly hack on RELENG_4 _at__at_ -1431,15 +1431,19 _at__at_ rid = 0; com->irqres = bus_alloc_resource(dev, SYS_RES_IRQ, &rid, 0ul, ~0ul, 1, - RF_ACTIVE); +/* RF_ACTIVE); */ + RF_SHAREABLE); to /usr/src/sys/isa/sio.c and at least we can talk to the sio device. However, on RELENG_5 there does not seem to be the same fix. My question is this-- Is the root cause the same issue on RELENG_4 and RELENG_5 ? Are we going about it the best way to fix the problem ? Or is the underlying problem something else ? Attached is a dmesg and acpidump ---Mike -------------------------------------------------------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike_at_sentex.net Providing Internet since 1994 www.sentex.net Cambridge, Ontario Canada www.sentex.net/mike
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:11 UTC