Panic on shutdown -r now (RESOLVED?)

From: David Boyd <David.Boyd_at_insightbb.com>
Date: Tue, 30 Nov 2004 15:58:24 -0500
The following problem was reported (by me and others) from about 5.3-BETA4
through 5.3-RELEASE.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++
This problem persisted into 5.3-RELEASE.  It may be related to panics
reported by
others.

The problem appears to be related to SMP and HTT.  It doesn't occur (for me)
with GENERIC.

It has been very difficult to obtain a "usable" dump.  The system is usually
locked tight.

Kernel is built with KDB DDB and BREAK_TO_DEBUGGER.  Even when the system
gets as far as indicating that the panic has occurred, it seldom enters the
debugger.  Usually, when it does enter the debugger, the system ignores any
key input, echoing colon or semicolon when the ENTER key is depressed.

Oh, yeah!  Once every fifty or so times the system will reboot normally.

This problem started during BETA testing ... back around BETA4 or BETA5 as I
recall.

Here's what I have for today (system is from RC2 ISO image).


from serial console:

The garbage in the display after "Shutting down ACPI" is "normal" to this
problem.

============================================================================
=================
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2004 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 5.3-RC2 #0: Mon Nov  1 14:48:42 EST 2004
    root_at_comm-server.support.bsd1.net:/usr/src/sys/i386/compile/DEBUG
ACPI APIC Table: <INTEL  PRODUCT8>
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz (2394.01-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,M
CA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Hyperthreading: 2 logical CPUs
real memory  = 534970368 (510 MB)
avail memory = 513937408 (490 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0 <Version 2.0> irqs 0-23 on motherboard
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <INTEL PRODUCT8> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu1: Failed to attach throttling P_CNT
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <Intel 82865 host to AGP bridge> mem 0xf8000000-0xfbffffff at device
0.0 on pci0
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pci1: <display, VGA> at device 0.0 (no driver attached)
uhci0: <Intel 82801EB (ICH5) USB controller USB-A> port 0xcc00-0xcc1f irq 16
at device 29.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: <Intel 82801EB (ICH5) USB controller USB-A> on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <Intel 82801EB (ICH5) USB controller USB-B> port 0xd000-0xd01f irq 19
at device 29.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: <Intel 82801EB (ICH5) USB controller USB-B> on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: <Intel 82801EB (ICH5) USB controller USB-C> port 0xd400-0xd41f irq 18
at device 29.2 on pci0
uhci2: [GIANT-LOCKED]
usb2: <Intel 82801EB (ICH5) USB controller USB-C> on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhci3: <Intel 82801EB (ICH5) USB controller USB-D> port 0xd800-0xd81f irq 16
at device 29.3 on pci0
uhci3: [GIANT-LOCKED]
usb3: <Intel 82801EB (ICH5) USB controller USB-D> on uhci3
usb3: USB revision 1.0
uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
pci0: <serial bus, USB> at device 29.7 (no driver attached)
pcib2: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci2: <ACPI PCI bus> on pcib2
atapci0: <Promise PDC20270 UDMA100 controller> port
0xac00-0xac0f,0xb000-0xb003,0xb400-0xb407,0xb800-0xb803,0xbc00-0xbc07 mem
0xfeaf0000-0xfeafffff irq 17 at device 2.0 on pci2
ata2: channel #0 on atapci0
ata3: channel #1 on atapci0
rl0: <D-Link DFE-530TX+ 10/100BaseTX> port 0xa800-0xa8ff mem
0xfeadfc00-0xfeadfcff irq 19 at device 3.0 on pci2
miibus0: <MII bus> on rl0
rlphy0: <RealTek internal media interface> on miibus0
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl0: Ethernet address: 00:0d:88:35:39:a0
rl1: <D-Link DFE-530TX+ 10/100BaseTX> port 0xa400-0xa4ff mem
0xfeadf800-0xfeadf8ff irq 18 at device 4.0 on pci2
miibus1: <MII bus> on rl1
rlphy1: <RealTek internal media interface> on miibus1
rlphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl1: Ethernet address: 00:0d:88:37:d7:ba
fxp0: <Intel 82801BA (D865) Pro/100 VE Ethernet> port 0xa000-0xa03f mem
0xfeade000-0xfeadefff irq 20 at device 8.0 on pci2
miibus2: <MII bus> on fxp0
inphy0: <i82562ET 10/100 media interface> on miibus2
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:11:11:0a:46:7b
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci1: <Intel ICH5 UDMA100 controller> port
0xffa0-0xffaf,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0
ata0: channel #0 on atapci1
ata1: channel #1 on atapci1
atapci2: <Intel ICH5 SATA150 controller> port
0xdc00-0xdc0f,0xe000-0xe003,0xe400-0xe407,0xe800-0xe803,0xec00-0xec07 irq 18
at device 31.2 on pci0
ata4: channel #0 on atapci2
ata5: channel #1 on atapci2
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
pci0: <multimedia, audio> at device 31.5 (no driver attached)
acpi_button0: <Sleep Button> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, device ID 4
fdc0: <floppy drive controller> port
0x3f7,0x3f4-0x3f5,0x3f2-0x3f3,0x3f0-0x3f1 irq 6 drq 2 on acpi0
fdc0: [FAST]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on
acpi0
sio0: type 16550A, console
ppc0: <Standard parallel printer port> port 0x378-0x37f irq 7 on acpi0
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
orm0: <ISA Option ROMs> at iomem
0xd6800-0xd77ff,0xd5800-0xd67ff,0xcc000-0xd57ff on isa0
pmtimer0 on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x100>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 10.000 msec
acpi_cpu: throttling enabled, 8 steps (100% to 12.5%), currently 100.0%
acd0: CDROM <CREATIVE CD5233E-N/0.20> at ata0-master UDMA33
ata1-slave: FAILURE - ATA_IDENTIFY timed out
ata1-slave: FAILURE - ATA_IDENTIFY timed out
ata1-master: FAILURE - SETFEATURES SET TRANSFER MODE status=1<ERROR>
error=4<ABORTED>
ata1-slave: FAILURE - ATA_IDENTIFY timed out
ata1-master: FAILURE - SETFEATURES SET TRANSFER MODE status=1<ERROR>
error=4<ABORTED>
afd0: REMOVABLE <IOMEGA ZIP 100 ATAPI/03.H> at ata1-master BIOSPIO
ad4: 76319MB <ST380011A/3.06> [155061/16/63] at ata2-master UDMA100
ad6: 76319MB <ST380011A/3.06> [155061/16/63] at ata3-master UDMA100
ar0: 76319MB <ATA RAID1 array> [9729/255/63] status: READY subdisks:
 disk0 READY on ad4 at ata2-master
 disk1 READY on ad6 at ata3-master
SMP: AP CPU #1 Launched!
Mounting root from ufs:/dev/ar0s1a
Pre-seeding PRNG: kickstart.
Loading configuration files.
Entropy harvesting: interrupts ethernet point_to_point kickstart.
kernel dumps on /dev/ar0s1b
swapon: adding /dev/ar0s1b as swap device
Starting file system checks:
/dev/ar0s1a: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ar0s1a: clean, 2000849 free (585 frags, 250033 blocks, 0.0%
fragmentation)
/dev/ar0s1d: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ar0s1d: clean, 3419753 free (40945 frags, 422351 blocks, 1.0%
fragmentation)
/dev/ar0s1e: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ar0s1e: clean, 8121013 free (461 frags, 1015069 blocks, 0.0%
fragmentation)
/dev/ar0s1f: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ar0s1f: clean, 4061052 free (28 frags, 507628 blocks, 0.0%
fragmentation)
/dev/ar0s1g: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ar0s1g: clean, 2029028 free (28 frags, 253625 blocks, 0.0%
fragmentation)
/dev/ar0s1h: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ar0s1h: clean, 3045027 free (27 frags, 380625 blocks, 0.0%
fragmentation)
/dev/ar0s2d: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ar0s2d: clean, 13470149 free (21 frags, 1683766 blocks, 0.0%
fragmentation)
Setting hostname: comm-server.support.bsd1.net.
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
	inet 127.0.0.1 netmask 0xff000000
	inet6 ::1 prefixlen 128
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
Starting dhclient.
fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	options=8<VLAN_MTU>
	inet6 fe80::211:11ff:fe0a:467b%fxp0 prefixlen 64 scopeid 0x3
	inet 192.168.210.51 netmask 0xffffff00 broadcast 192.168.210.255
	ether 00:11:11:0a:46:7b
	media: Ethernet autoselect (100baseTX <full-duplex>)
	status: active
Additional routing options: IP gateway=YES.
Starting devd.
Mounting NFS file systems:.
Starting syslogd.
Nov  1 15:56:44 comm-server syslogd: kernel boot file is /boot/kernel/kernel
Checking for core dump on /dev/ar0s1b ...
savecore: no dumps found
Setting date via ntp.
Looking for host 192.168.210.1 and service ntp
host found : free.bsd1.net
 1 Nov 15:56:45 ntpdate[312]: step time server 192.168.210.1 offset 1.115163
sec
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/X11R6/lib
/usr/local/lib
a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout /usr/X11R6/lib/aout
Starting usbd.
Starting local daemons:.
Updating motd.
Configuring syscons: blanktime.
Starting sshd.
Initial i386 initialization:.
Additional ABI support:.
Starting cron.
Local package initialization:.
Additional TCP options:.
Starting background file system checks in 60 seconds.

Mon Nov  1 15:56:47 EST 2004
 FreeBSD/i386 (comm-server.support.bsd1.net) (ttyd0)  login: root
Password:
Nov  1 15:56:52 comm-server login: ROOT LOGIN (root) ON ttyd0 Last login:
Mon Nov  1 15:10:59 on ttyd0
Copyright (c) 1992-2004 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.

FreeBSD 5.3-RC2 (DEBUG) #0: Mon Nov  1 14:48:42 EST 2004

Welcome to FreeBSD!

Before seeking technical support, please use the following resources:

o  Security advisories and updated errata information for all releases are
   at http://www.FreeBSD.org/releases/ - always consult the ERRATA section
   for your release first as it's updated frequently.

o  The Handbook and FAQ documents are at http://www.FreeBSD.org/ and,
   along with the mailing lists, can be searched by going to
   http://www.FreeBSD.org/search/.  If the doc distribution has
   been installed, they're also available formatted in /usr/share/doc.

If you still have a question or problem, please take the output of
`uname -a', along with any relevant error messages, and email it
as a question to the questions_at_FreeBSD.org mailing list.  If you are
unfamiliar with FreeBSD's directory layout, please refer to the hier(7)
manual page.  If you are not familiar with manual pages, type `man man'.

You may also use sysinstall(8) to re-enter the installation and
configuration utility.  Edit /etc/motd to change this login announcement.

erase ^H, kill ^U, intr ^C status ^T
FreeBSD
cons25
ttyd0
[comm-server.support.bsd1.net:ttyd0:/root ]> shutdown -r now Shutdown NOW!
shutdown: [pid 497]

    *** FINAL System shutdown message from
root_at_comm-server.support.bsd1.net ***  System going down IMMEDIATELY
Nov  1 15:56:58 comm-server shutdown: reboot by root:
[comm-server.support.bsd1.net:ttyd0:/root ]>  System shutdown time has
arrived Shutting down daemon processes:.
Stopping cron.
Shutting down local daemons:.
Writing entropy file:.
.
Nov  1 15:57:00 comm-server syslogd: exiting on signal 15 boot() called on
cpu#1
Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...4 4 2 2 0 0 0 done
No buffers busy after final sync
Uptime: 52s
Waiting (max 60 seconds) for system process `hpt_wt' to stop...done
Shutting down ACPI
kk
e
rFnaetla lt rdaopu b1l2e  wfiatuhl ti:n
t
eerirpu p=t s0 xdci1s9aabcl4ebdc
esp = 0x6460c19a
ebp = 0x0
cpuid = 1; apic id = 01
panic: double fault
cpuid = 1
KDB: enter: panic
[thread 100002]
Stopped at      kdb_enter+0x2b: nop
db> whre  ere
kdb_enter(c08291f5) at kdb_enter+0x2b
panic(c084267e,c08427ef,1,0,0) at panic+0x127
dblfault_handler() at dblfault_handler+0x7a
--- trap 0x17, eip = 0xc19ac4bc, esp = 0x6460c19a, ebp = 0 ---
_end() at 0xc19ac4bc
db> trace
kdb_enter(c08291f5) at kdb_enter+0x2b
panic(c084267e,c08427ef,1,0,0) at panic+0x127
dblfault_handler() at dblfault_handler+0x7a
--- trap 0x17, eip = 0xc19ac4bc, esp = 0x6460c19a, ebp = 0 ---
_end() at 0xc19ac4bc
db> call doae dump
Dumping 510 MB
 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320
336 352 368 384 400 416 432 448 464 480 496
Dump complete
0xf
db> reset
cpu_reset called on cpu#1
cpu_reset: Restarting BSP
cpu_reset_proxy: Stopped CPU 1


from kgdb:
============================================================================
=================
kgdb kernel.debug vmcore.0 [GDB will not be able
to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol
"ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".
doadump () at pcpu.h:159
(kgdb) whre  ere
#0  doadump () at pcpu.h:159
#1  0xc0460cd6 in db_fncall (dummy1=0, dummy2=0, dummy3=-1064198276,
    dummy4=0xc0919f64 "\230\237\221À\200%") at ../../../ddb/db_command.c:531
#2  0xc0460ae4 in db_command (last_cmdp=0xc08c7a44, cmd_table=0x0,
    aux_cmd_tablep=0xc0848104, aux_cmd_tablep_end=0xc0848120)
    at ../../../ddb/db_command.c:349
#3  0xc0460bac in db_command_loop () at ../../../ddb/db_command.c:455
#4  0xc0462725 in db_trap (type=3, code=0) at ../../../ddb/db_main.c:221
#5  0xc062adc7 in kdb_trap (type=3, code=0, tf=0x1)
    at ../../../kern/subr_kdb.c:418
#6  0xc07c2f74 in trap (frame=
      {tf_fs = -1064239080, tf_es = -1067319280, tf_ds = -1065222128, tf_edi
= -1065081218, tf_esi = 1, tf_ebp = -1064197916, tf_isp = -1064197936,
tf_ebx = -1064197872, tf_edx = 0, tf_ecx = -1056882688, tf_eax = 18,
tf_trapno = 3, tf_err = 0, tf_eip = -1067275477, tf_cs = 8, tf_eflags =
16534, tf_esp = -1064197884, tf_ss = -1067371761}) at
../../../i386/i386/trap.c:576
#7  0xc07b0d1a in calltrap () at ../../../i386/i386/exception.s:140
#8  0xc0910018 in sc_buffer.5 ()
#9  0xc0620010 in umtx_remove (uq=0xc091a110, td=0x0)
    at ../../../kern/kern_umtx.c:135
#10 0xc061330f in panic (fmt=0xc084267e "double fault")
    at ../../../kern/kern_shutdown.c:537
#11 0xc07c3566 in dblfault_handler () at ../../../i386/i386/trap.c:838
#12 0x00000000 in ?? ()
(kgdb) quit
[comm-server.support.bsd1.net:ttyd0:/var/crash ]>

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++

The following commit seems to have cured the problem.

	Edit src/sys/kern/kern_shutdown.c
  		Add delta 1.163.2.3 2004.11.29.19.11.36 njl

If this fix does, in fact, address this problem, can I expect to see it in
an official patch to 5.3-RELEASE?

This is the only issue keeping us from upgrading/deploying 5.3-RELEASE on
all (twenty-two at last count) of our production servers.  I can't get an
agreement to deploy 5.3-STABLE from my management, so it's 5.3-RELEASE-px or
wait until 5.4-RELEASE. I'd rather not wait.

Thanks for any information that you can supply.
Received on Tue Nov 30 2004 - 19:57:32 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:23 UTC