Re: hard lock-up writing to tape

From: Doug White <dwhite_at_gumbysoft.com>
Date: Mon, 17 Nov 2003 09:50:58 -0800 (PST)
On Sun, 16 Nov 2003, Mike Durian wrote:

> I'm using -current cvsup'd as of Nov 15, 2003.  When I try to do a
> dump or run the btape (fill command) program from bacula, my machine
> will lock up hard.  Doesn't respond to ping.  No access to kernel
> debugger.  Num lock doesn't come on.

Sounds like a Giant deadlock.

dwhite's Form Letter on Debugging Giant Deadlocks

If you are experiencing problems with CURRENT locking up hard, it may be
due to a deadlock against the Giant mutex, which controls large parts of
the kernel.  Symptoms include:

. No response to any input
	. System video console
	. Network (ping)

To debug this, you will need to set up a serial console with some special
kernel options.  Instructions for booting with serial console are in the
Handbook, but you will have to compile with the following kernel options:

options DDB
options BREAK_TO_DEBUGGER
options WITNESS
options INVARIANTS
options INVARIANTS_SUPPORT

Make sure your serial console is capable of sending a Break signal. If
not, use "ALT_BREAK_TO_DEBUGGER" instead of "BREAK_TO_DEBUGGER".

Enable the serial console and boot the system. Turn on terminal logging.
In loader, stop the boot and type "boot -v" at the OK prompt to get
additional info during the boot process.

Once the system is up, trigger the hang. When the system hangs, issue the
Break signal (or if you have used ALT_BREAK_TO_DEBUGGER, press Enter ~ ^E
b (tilde, Ctrl-E, b)).

If you get the db> prompt, then your hang is probably due to a Giant
deadlock. If not, then something else may be at fault.

Once in db>, run the following two commands and capture their output using
your terminal's logging capability:

show locks
tr

Take these and the boot -v output, put them on a webpage, and send a
message to current_at_freebsd.org carefully explaining what you did to
trigger the hang.

Good luck!

>
> I can perform a dump or run the btape fill program when in single
> user mode, but in multi-user the machine will only stay up for
> a short while before locking.
>
> This has been happening since I got the tape system (Sparcstorage
> Library) about 3-4 weeks ago.  I don't know how long the problem
> existed before then as I didn't have a tape system to use.
>
> I've tried two types of SCSI cards: Adaptec 2930 and ASUS PCI-SC200
> (sym(4) device).  Both behave the same.
>
> I wonder if it could be network or interrupt related.  In single
> user mode, the network interface is not up.
>
> Dmesg from my system follows:
> Copyright (c) 1992-2003 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> 	The Regents of the University of California. All rights reserved.
> FreeBSD 5.1-CURRENT #57: Sat Nov 15 15:50:50 MST 2003
>     root_at_man.boogie.com:/disk2/obj/disk2/src/sys/BOOGIE
> Preloaded elf kernel "/boot/kernel/kernel" at 0xc0a93000.
> Preloaded elf module "/boot/kernel/linux.ko" at 0xc0a931f4.
> Preloaded elf module "/boot/kernel/snd_pcm.ko" at 0xc0a932a0.
> Preloaded elf module "/boot/kernel/snd_via82c686.ko" at 0xc0a9334c.
> Preloaded elf module "/boot/kernel/sym.ko" at 0xc0a93400.
> Preloaded elf module "/boot/kernel/nvidia.ko" at 0xc0a934a8.
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: AMD Athlon(tm) processor (1002.28-MHz 686-class CPU)
>   Origin = "AuthenticAMD"  Id = 0x642  Stepping = 2
>   Features=0x183f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR>
>   AMD Features=0xc0440000<RSVD,AMIE,DSP,3DNow!>
> real memory  = 1073676288 (1023 MB)
> avail memory = 1033502720 (985 MB)
> Pentium Pro MTRR support enabled
> npx0: [FAST]
> npx0: <math processor> on motherboard
> npx0: INT 16 interface
> acpi0: <VIA694 AWRDACPI> on motherboard
> pcibios: BIOS version 2.10
> Using $PIR table, 8 entries at 0xc00fde30
> acpi0: Power Button (fixed)
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0
> acpi_cpu0: <CPU> on acpi0
> acpi_button0: <Power Button> on acpi0
> pcib0: <ACPI Host-PCI bridge> port
> 0x6000-0x607f,0x5000-0x500f,0x4080-0x40ff,0x4000-0x407f,0xcf8-0xcff on acpi0
> pci0: <ACPI PCI bus> on pcib0
> pcib0: slot 7 INTD is routed to irq 11
> pcib0: slot 7 INTD is routed to irq 11
> pcib0: slot 7 INTC is routed to irq 10
> pcib0: slot 9 INTA is routed to irq 9
> pcib0: slot 9 INTA is routed to irq 9
> pcib0: slot 9 INTA is routed to irq 9
> pcib0: slot 9 INTA is routed to irq 9
> pcib0: slot 10 INTA is routed to irq 10
> pcib0: slot 11 INTA is routed to irq 11
> pcib0: slot 12 INTA is routed to irq 10
> pcib0: slot 13 INTA is routed to irq 11
> agp0: <VIA 82C8363 (Apollo KT133A) host to PCI bridge> mem
> 0xd0000000-0xd7ffffff at device 0.0 on pci0
> pcib1: <PCI-PCI bridge> at device 1.0 on pci0
> pci1: <PCI bus> on pcib1
> pcib0: slot 1 INTA is routed to irq 5
> pcib1: slot 0 INTA is routed to irq 5
> nvidia0: <GeForce4 MX 440 with AGP8X> mem
> 0xd8000000-0xdfffffff,0xe0000000-0xe0ffffff irq 5 at device 0.0 on pci1
> isab0: <PCI-ISA bridge> at device 7.0 on pci0
> isa0: <ISA bus> on isab0
> atapci0: <VIA 82C686B UDMA100 controller> port 0xa000-0xa00f at device 7.1 on
> pci0
> atapci0: Correcting VIA config for southbridge data corruption bug
> ata0: at 0x1f0 irq 14 on atapci0
> ata0: [MPSAFE]
> ata1: at 0x170 irq 15 on atapci0
> ata1: [MPSAFE]
> uhci0: <VIA 83C572 USB controller> port 0xa400-0xa41f irq 11 at device 7.2 on
> pci0
> usb0: <VIA 83C572 USB controller> on uhci0
> usb0: USB revision 1.0
> uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub0: 2 ports with 2 removable, self powered
> uhci1: <VIA 83C572 USB controller> port 0xa800-0xa81f irq 11 at device 7.3 on
> pci0
> usb1: <VIA 83C572 USB controller> on uhci1
> usb1: USB revision 1.0
> uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub1: 2 ports with 2 removable, self powered
> viapropm0: SMBus I/O base at 0x5000
> viapropm0: <VIA VT82C686A Power Management Unit> port 0x5000-0x500f at device
> 7.4 on pci0
> viapropm0: SMBus revision code 0x40
> smbus0: <System Management Bus> on viapropm0
> smb0: <SMBus generic I/O> on smbus0
> pcm0: <VIA VT82C686A> port 0xb400-0xb403,0xb000-0xb003,0xac00-0xacff irq 10 at
> device 7.5 on pci0
> pcm0: <ICEnsemble ICE1232 AC97 Codec>
> ohci0: <OHCI (generic) USB controller> mem 0xe3006000-0xe3006fff irq 9 at
> device 9.0 on pci0
> usb2: OHCI version 1.0, legacy support
> usb2: <OHCI (generic) USB controller> on ohci0
> usb2: USB revision 1.0
> uhub2: (0x11c1) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub2: 1 port with 1 removable, self powered
> ohci1: <OHCI (generic) USB controller> mem 0xe3007000-0xe3007fff irq 9 at
> device 9.1 on pci0
> usb3: OHCI version 1.0, legacy support
> usb3: <OHCI (generic) USB controller> on ohci1
> usb3: USB revision 1.0
> uhub3: (0x11c1) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub3: 1 port with 1 removable, self powered
> ohci2: <OHCI (generic) USB controller> mem 0xe3004000-0xe3004fff irq 9 at
> device 9.2 on pci0
> usb4: OHCI version 1.0, legacy support
> usb4: <OHCI (generic) USB controller> on ohci2
> usb4: USB revision 1.0
> uhub4: (0x11c1) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub4: 1 port with 1 removable, self powered
> ohci3: <OHCI (generic) USB controller> mem 0xe3005000-0xe3005fff irq 9 at
> device 9.3 on pci0
> usb5: OHCI version 1.0, legacy support
> usb5: <OHCI (generic) USB controller> on ohci3
> usb5: USB revision 1.0
> uhub5: (0x11c1) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub5: 1 port with 1 removable, self powered
> puc0: <NetMos NM9835 Dual UART and 1284 Printer port> port
> 0xcc00-0xcc0f,0xc800-0xc807,0xc400-0xc407,0xc000-0xc007,0xbc00-0xbc07,0xb800-0xb807
> irq 10 at device 10.0 on pci0
> sio4: <NetMos NM9835 Dual UART and 1284 Printer port> on puc0
> sio4: type 16550A
> sio4: unable to activate interrupt in fast mode - using normal mode
> sio5: <NetMos NM9835 Dual UART and 1284 Printer port> on puc0
> sio5: type 16550A
> sio5: unable to activate interrupt in fast mode - using normal mode
> ppc0: <Parallel port> on puc0
> ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
> ppc0: FIFO with 16/16/12 bytes threshold
> ppbus0: <Parallel port bus> on ppc0
> plip0: <PLIP network interface> on ppbus0
> lpt0: <Printer> on ppbus0
> lpt0: Interrupt-driven port
> ppi0: <Parallel I/O> on ppbus0
> atapci1: <Promise PDC20268 UDMA100 controller> port
> 0xe000-0xe00f,0xdc00-0xdc03,0xd800-0xd807,0xd400-0xd403,0xd000-0xd007 mem
> 0xe3000000-0xe3003fff irq 11 at device 11.0 on pci0
> atapci1: [MPSAFE]
> ata2: at 0xd000 on atapci1
> ata2: [MPSAFE]
> ata3: at 0xd800 on atapci1
> ata3: [MPSAFE]
> dc0: <ADMtek AN985 10/100BaseTX> port 0xe400-0xe4ff mem 0xe3008000-0xe30083ff
> irq 10 at device 12.0 on pci0
> dc0: Ethernet address: 00:03:6d:1d:fa:e6
> miibus0: <MII bus> on dc0
> ukphy0: <Generic IEEE 802.3u media interface> on miibus0
> ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> ahc0: <Adaptec 2930CU SCSI adapter> port 0xe800-0xe8ff mem
> 0xe3009000-0xe3009fff irq 11 at device 13.0 on pci0
> aic7860: Ultra Single Channel A, SCSI Id=7, 3/253 SCBs
> fdc0: <Enhanced floppy controller (i82077, NE72065 or clone)> port
> 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
> fdc0: FIFO enabled, 8 bytes threshold
> fd0: <1440-KB 3.5" drive> on fdc0 drive 0
> sio0 port 0x3f8-0x3ff irq 4 on acpi0
> sio0: type 16550A
> sio1 port 0x2e8-0x2ef irq 3 on acpi0
> sio1: type 16550A
> ppc1 port 0x378-0x37f irq 7 on acpi0
> ppc1: Generic chipset (EPP/NIBBLE) in COMPATIBLE mode
> ppbus1: <Parallel port bus> on ppc1
> ppbus1: IEEE1284 device found /NIBBLE/ECP
> Probing for PnP devices on ppbus1:
> ppbus1: <Hewlett-Packard OfficeJet G85> MLC,PCL,PML,SCL
> plip1: <PLIP network interface> on ppbus1
> lpt1: <Printer> on ppbus1
> lpt1: Interrupt-driven port
> ppi1: <Parallel I/O> on ppbus1
> atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
> atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
> kbd0 at atkbd0
> psm0: <PS/2 Mouse> irq 12 on atkbdc0
> psm0: model IntelliMouse Explorer, device ID 4
> orm0: <Option ROMs> at iomem 0xd3000-0xd37ff,0xd0000-0xd27ff on isa0
> pmtimer0 on isa0
> sc0: <System console> at flags 0x100 on isa0
> sc0: VGA <16 virtual consoles, flags=0x300>
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
> Timecounter "TSC" frequency 1002281268 Hz quality 800
> Timecounters tick every 10.000 msec
> acpi_cpu: throttling enabled, 2 steps (100% to 50.0%), currently 100.0%
> GEOM: create disk ad0 dp=0xc6987d60
> ad0: 19092MB <WDC WD200BB-00AUA1> [38792/16/63] at ata0-master UDMA100
> acd0: CDRW <OPTORITE CD-RW CW4802> at ata1-master PIO4
> GEOM: create disk ad4 dp=0xc6988060
> ad4: 95396MB <WDC WD1000BB-00CJA1> [193821/16/63] at ata2-master UDMA100
> Waiting 3 seconds for SCSI devices to settle
> umass0: SanDisk Corporation ImageMate CompactFlash USB, rev 1.10/0.09, addr 2
> umass0: SCSI over Bulk-Only; quirks = 0x0000
> umass0: Get Max Lun not supported (STALLED)
> umass0:5:0:-1: Attached to scbus5
> GEOM: create disk cd0 dp=0xc695d600
> GEOM: create disk da0 dp=0xc6b59850
> sa0 at ahc0 bus 0 target 4 lun 0
> sa0: <EXABYTE EXB-8505SMBANSH2 0793> Removable Sequential Access SCSI-2 device
> sa0: 5.000MB/s transfers (5.000MHz, offset 11)
> sa1 at ahc0 bus 0 target 5 lun 0
> sa1: <EXABYTE EXB-8505SMBANSH2 0793> Removable Sequential Access SCSI-2 device
> sa1: 5.000MB/s transfers (5.000MHz, offset 11)
> cd0 at ata1 bus 0 target 0 lun 0
> cd0: <OPTORITE CD-RW CW4802 120E> Removable CD-ROM SCSI-0 device
> cd0: 16.000MB/s transfers
> cd0: Attempt to query device size failed: NOT READY, Medium not present
> da0 at umass-sim0 bus 0 target 0 lun 0
> da0: <SanDisk ImageMate II 1.30> Removable Direct Access SCSI-2 device
> da0: 1.000MB/s transfers
> da0: Attempt to query device size failed: NOT READY, Medium not present
> (da0:umass-sim0:0:0:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0
> (da0:umass-sim0:0:0:0): CAM Status: SCSI Status Error
> (da0:umass-sim0:0:0:0): SCSI Status: Check Condition
> (da0:umass-sim0:0:0:0): NOT READY asc:3a,0
> (da0:umass-sim0:0:0:0): Medium not present
> (da0:umass-sim0:0:0:0): Unretryable error
> Opened disk da0 -> 6
> ch0 at ahc0 bus 0 target 3 lun 0
> ch0: <EXABYTE EXB-210 3.11> Removable Changer SCSI-2 device
> ch0: 3.300MB/s transfers
> ch0: 11 slots, 2 drives, 1 picker, 0 portals
> (da0:umass-sim0:0:0:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0
> (da0:umass-sim0:0:0:0): CAM Status: SCSI Status Error
> (da0:umass-sim0:0:0:0): SCSI Status: Check Condition
> (da0:umass-sim0:0:0:0): NOT READY asc:3a,0
> (da0:umass-sim0:0:0:0): Medium not present
> (da0:umass-sim0:0:0:0): Unretryable error
> Opened disk da0 -> 6
> Mounting root from ufs:/dev/ad0s2a
> WARNING: / was not properly dismounted
> WARNING: /usr was not properly dismounted
> WARNING: /disk2 was not properly dismounted
>
>
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite_at_gumbysoft.com          |  www.FreeBSD.org
Received on Mon Nov 17 2003 - 08:50:59 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:29 UTC