The following problem was reported (by me and others) from about 5.3-BETA4 through 5.3-RELEASE. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++ This problem persisted into 5.3-RELEASE. It may be related to panics reported by others. The problem appears to be related to SMP and HTT. It doesn't occur (for me) with GENERIC. It has been very difficult to obtain a "usable" dump. The system is usually locked tight. Kernel is built with KDB DDB and BREAK_TO_DEBUGGER. Even when the system gets as far as indicating that the panic has occurred, it seldom enters the debugger. Usually, when it does enter the debugger, the system ignores any key input, echoing colon or semicolon when the ENTER key is depressed. Oh, yeah! Once every fifty or so times the system will reboot normally. This problem started during BETA testing ... back around BETA4 or BETA5 as I recall. Here's what I have for today (system is from RC2 ISO image). from serial console: The garbage in the display after "Shutting down ACPI" is "normal" to this problem. ============================================================================ ================= KDB: debugger backends: ddb KDB: current backend: ddb Copyright (c) 1992-2004 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.3-RC2 #0: Mon Nov 1 14:48:42 EST 2004 root_at_comm-server.support.bsd1.net:/usr/src/sys/i386/compile/DEBUG ACPI APIC Table: <INTEL PRODUCT8> Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz (2394.01-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,M CA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Hyperthreading: 2 logical CPUs real memory = 534970368 (510 MB) avail memory = 513937408 (490 MB) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0 <Version 2.0> irqs 0-23 on motherboard npx0: [FAST] npx0: <math processor> on motherboard npx0: INT 16 interface acpi0: <INTEL PRODUCT8> on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 cpu1: Failed to attach throttling P_CNT pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 agp0: <Intel 82865 host to AGP bridge> mem 0xf8000000-0xfbffffff at device 0.0 on pci0 pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pci1: <display, VGA> at device 0.0 (no driver attached) uhci0: <Intel 82801EB (ICH5) USB controller USB-A> port 0xcc00-0xcc1f irq 16 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: <Intel 82801EB (ICH5) USB controller USB-A> on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: <Intel 82801EB (ICH5) USB controller USB-B> port 0xd000-0xd01f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: <Intel 82801EB (ICH5) USB controller USB-B> on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: <Intel 82801EB (ICH5) USB controller USB-C> port 0xd400-0xd41f irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] usb2: <Intel 82801EB (ICH5) USB controller USB-C> on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered uhci3: <Intel 82801EB (ICH5) USB controller USB-D> port 0xd800-0xd81f irq 16 at device 29.3 on pci0 uhci3: [GIANT-LOCKED] usb3: <Intel 82801EB (ICH5) USB controller USB-D> on uhci3 usb3: USB revision 1.0 uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub3: 2 ports with 2 removable, self powered pci0: <serial bus, USB> at device 29.7 (no driver attached) pcib2: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci2: <ACPI PCI bus> on pcib2 atapci0: <Promise PDC20270 UDMA100 controller> port 0xac00-0xac0f,0xb000-0xb003,0xb400-0xb407,0xb800-0xb803,0xbc00-0xbc07 mem 0xfeaf0000-0xfeafffff irq 17 at device 2.0 on pci2 ata2: channel #0 on atapci0 ata3: channel #1 on atapci0 rl0: <D-Link DFE-530TX+ 10/100BaseTX> port 0xa800-0xa8ff mem 0xfeadfc00-0xfeadfcff irq 19 at device 3.0 on pci2 miibus0: <MII bus> on rl0 rlphy0: <RealTek internal media interface> on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto rl0: Ethernet address: 00:0d:88:35:39:a0 rl1: <D-Link DFE-530TX+ 10/100BaseTX> port 0xa400-0xa4ff mem 0xfeadf800-0xfeadf8ff irq 18 at device 4.0 on pci2 miibus1: <MII bus> on rl1 rlphy1: <RealTek internal media interface> on miibus1 rlphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto rl1: Ethernet address: 00:0d:88:37:d7:ba fxp0: <Intel 82801BA (D865) Pro/100 VE Ethernet> port 0xa000-0xa03f mem 0xfeade000-0xfeadefff irq 20 at device 8.0 on pci2 miibus2: <MII bus> on fxp0 inphy0: <i82562ET 10/100 media interface> on miibus2 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:11:11:0a:46:7b isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci1: <Intel ICH5 UDMA100 controller> port 0xffa0-0xffaf,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0 ata0: channel #0 on atapci1 ata1: channel #1 on atapci1 atapci2: <Intel ICH5 SATA150 controller> port 0xdc00-0xdc0f,0xe000-0xe003,0xe400-0xe407,0xe800-0xe803,0xec00-0xec07 irq 18 at device 31.2 on pci0 ata4: channel #0 on atapci2 ata5: channel #1 on atapci2 pci0: <serial bus, SMBus> at device 31.3 (no driver attached) pci0: <multimedia, audio> at device 31.5 (no driver attached) acpi_button0: <Sleep Button> on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse Explorer, device ID 4 fdc0: <floppy drive controller> port 0x3f7,0x3f4-0x3f5,0x3f2-0x3f3,0x3f0-0x3f1 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A, console ppc0: <Standard parallel printer port> port 0x378-0x37f irq 7 on acpi0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: <Parallel port bus> on ppc0 plip0: <PLIP network interface> on ppbus0 lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 orm0: <ISA Option ROMs> at iomem 0xd6800-0xd77ff,0xd5800-0xd67ff,0xcc000-0xd57ff on isa0 pmtimer0 on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x100> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 10.000 msec acpi_cpu: throttling enabled, 8 steps (100% to 12.5%), currently 100.0% acd0: CDROM <CREATIVE CD5233E-N/0.20> at ata0-master UDMA33 ata1-slave: FAILURE - ATA_IDENTIFY timed out ata1-slave: FAILURE - ATA_IDENTIFY timed out ata1-master: FAILURE - SETFEATURES SET TRANSFER MODE status=1<ERROR> error=4<ABORTED> ata1-slave: FAILURE - ATA_IDENTIFY timed out ata1-master: FAILURE - SETFEATURES SET TRANSFER MODE status=1<ERROR> error=4<ABORTED> afd0: REMOVABLE <IOMEGA ZIP 100 ATAPI/03.H> at ata1-master BIOSPIO ad4: 76319MB <ST380011A/3.06> [155061/16/63] at ata2-master UDMA100 ad6: 76319MB <ST380011A/3.06> [155061/16/63] at ata3-master UDMA100 ar0: 76319MB <ATA RAID1 array> [9729/255/63] status: READY subdisks: disk0 READY on ad4 at ata2-master disk1 READY on ad6 at ata3-master SMP: AP CPU #1 Launched! Mounting root from ufs:/dev/ar0s1a Pre-seeding PRNG: kickstart. Loading configuration files. Entropy harvesting: interrupts ethernet point_to_point kickstart. kernel dumps on /dev/ar0s1b swapon: adding /dev/ar0s1b as swap device Starting file system checks: /dev/ar0s1a: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/ar0s1a: clean, 2000849 free (585 frags, 250033 blocks, 0.0% fragmentation) /dev/ar0s1d: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/ar0s1d: clean, 3419753 free (40945 frags, 422351 blocks, 1.0% fragmentation) /dev/ar0s1e: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/ar0s1e: clean, 8121013 free (461 frags, 1015069 blocks, 0.0% fragmentation) /dev/ar0s1f: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/ar0s1f: clean, 4061052 free (28 frags, 507628 blocks, 0.0% fragmentation) /dev/ar0s1g: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/ar0s1g: clean, 2029028 free (28 frags, 253625 blocks, 0.0% fragmentation) /dev/ar0s1h: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/ar0s1h: clean, 3045027 free (27 frags, 380625 blocks, 0.0% fragmentation) /dev/ar0s2d: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/ar0s2d: clean, 13470149 free (21 frags, 1683766 blocks, 0.0% fragmentation) Setting hostname: comm-server.support.bsd1.net. lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 Starting dhclient. fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 options=8<VLAN_MTU> inet6 fe80::211:11ff:fe0a:467b%fxp0 prefixlen 64 scopeid 0x3 inet 192.168.210.51 netmask 0xffffff00 broadcast 192.168.210.255 ether 00:11:11:0a:46:7b media: Ethernet autoselect (100baseTX <full-duplex>) status: active Additional routing options: IP gateway=YES. Starting devd. Mounting NFS file systems:. Starting syslogd. Nov 1 15:56:44 comm-server syslogd: kernel boot file is /boot/kernel/kernel Checking for core dump on /dev/ar0s1b ... savecore: no dumps found Setting date via ntp. Looking for host 192.168.210.1 and service ntp host found : free.bsd1.net 1 Nov 15:56:45 ntpdate[312]: step time server 192.168.210.1 offset 1.115163 sec ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/X11R6/lib /usr/local/lib a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout /usr/X11R6/lib/aout Starting usbd. Starting local daemons:. Updating motd. Configuring syscons: blanktime. Starting sshd. Initial i386 initialization:. Additional ABI support:. Starting cron. Local package initialization:. Additional TCP options:. Starting background file system checks in 60 seconds. Mon Nov 1 15:56:47 EST 2004 FreeBSD/i386 (comm-server.support.bsd1.net) (ttyd0) login: root Password: Nov 1 15:56:52 comm-server login: ROOT LOGIN (root) ON ttyd0 Last login: Mon Nov 1 15:10:59 on ttyd0 Copyright (c) 1992-2004 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.3-RC2 (DEBUG) #0: Mon Nov 1 14:48:42 EST 2004 Welcome to FreeBSD! Before seeking technical support, please use the following resources: o Security advisories and updated errata information for all releases are at http://www.FreeBSD.org/releases/ - always consult the ERRATA section for your release first as it's updated frequently. o The Handbook and FAQ documents are at http://www.FreeBSD.org/ and, along with the mailing lists, can be searched by going to http://www.FreeBSD.org/search/. If the doc distribution has been installed, they're also available formatted in /usr/share/doc. If you still have a question or problem, please take the output of `uname -a', along with any relevant error messages, and email it as a question to the questions_at_FreeBSD.org mailing list. If you are unfamiliar with FreeBSD's directory layout, please refer to the hier(7) manual page. If you are not familiar with manual pages, type `man man'. You may also use sysinstall(8) to re-enter the installation and configuration utility. Edit /etc/motd to change this login announcement. erase ^H, kill ^U, intr ^C status ^T FreeBSD cons25 ttyd0 [comm-server.support.bsd1.net:ttyd0:/root ]> shutdown -r now Shutdown NOW! shutdown: [pid 497] *** FINAL System shutdown message from root_at_comm-server.support.bsd1.net *** System going down IMMEDIATELY Nov 1 15:56:58 comm-server shutdown: reboot by root: [comm-server.support.bsd1.net:ttyd0:/root ]> System shutdown time has arrived Shutting down daemon processes:. Stopping cron. Shutting down local daemons:. Writing entropy file:. . Nov 1 15:57:00 comm-server syslogd: exiting on signal 15 boot() called on cpu#1 Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...4 4 2 2 0 0 0 done No buffers busy after final sync Uptime: 52s Waiting (max 60 seconds) for system process `hpt_wt' to stop...done Shutting down ACPI kk e rFnaetla lt rdaopu b1l2e wfiatuhl ti:n t eerirpu p=t s0 xdci1s9aabcl4ebdc esp = 0x6460c19a ebp = 0x0 cpuid = 1; apic id = 01 panic: double fault cpuid = 1 KDB: enter: panic [thread 100002] Stopped at kdb_enter+0x2b: nop db> whre ere kdb_enter(c08291f5) at kdb_enter+0x2b panic(c084267e,c08427ef,1,0,0) at panic+0x127 dblfault_handler() at dblfault_handler+0x7a --- trap 0x17, eip = 0xc19ac4bc, esp = 0x6460c19a, ebp = 0 --- _end() at 0xc19ac4bc db> trace kdb_enter(c08291f5) at kdb_enter+0x2b panic(c084267e,c08427ef,1,0,0) at panic+0x127 dblfault_handler() at dblfault_handler+0x7a --- trap 0x17, eip = 0xc19ac4bc, esp = 0x6460c19a, ebp = 0 --- _end() at 0xc19ac4bc db> call doae dump Dumping 510 MB 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496 Dump complete 0xf db> reset cpu_reset called on cpu#1 cpu_reset: Restarting BSP cpu_reset_proxy: Stopped CPU 1 from kgdb: ============================================================================ ================= kgdb kernel.debug vmcore.0 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". doadump () at pcpu.h:159 (kgdb) whre ere #0 doadump () at pcpu.h:159 #1 0xc0460cd6 in db_fncall (dummy1=0, dummy2=0, dummy3=-1064198276, dummy4=0xc0919f64 "\230\237\221À\200%") at ../../../ddb/db_command.c:531 #2 0xc0460ae4 in db_command (last_cmdp=0xc08c7a44, cmd_table=0x0, aux_cmd_tablep=0xc0848104, aux_cmd_tablep_end=0xc0848120) at ../../../ddb/db_command.c:349 #3 0xc0460bac in db_command_loop () at ../../../ddb/db_command.c:455 #4 0xc0462725 in db_trap (type=3, code=0) at ../../../ddb/db_main.c:221 #5 0xc062adc7 in kdb_trap (type=3, code=0, tf=0x1) at ../../../kern/subr_kdb.c:418 #6 0xc07c2f74 in trap (frame= {tf_fs = -1064239080, tf_es = -1067319280, tf_ds = -1065222128, tf_edi = -1065081218, tf_esi = 1, tf_ebp = -1064197916, tf_isp = -1064197936, tf_ebx = -1064197872, tf_edx = 0, tf_ecx = -1056882688, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1067275477, tf_cs = 8, tf_eflags = 16534, tf_esp = -1064197884, tf_ss = -1067371761}) at ../../../i386/i386/trap.c:576 #7 0xc07b0d1a in calltrap () at ../../../i386/i386/exception.s:140 #8 0xc0910018 in sc_buffer.5 () #9 0xc0620010 in umtx_remove (uq=0xc091a110, td=0x0) at ../../../kern/kern_umtx.c:135 #10 0xc061330f in panic (fmt=0xc084267e "double fault") at ../../../kern/kern_shutdown.c:537 #11 0xc07c3566 in dblfault_handler () at ../../../i386/i386/trap.c:838 #12 0x00000000 in ?? () (kgdb) quit [comm-server.support.bsd1.net:ttyd0:/var/crash ]> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++ The following commit seems to have cured the problem. Edit src/sys/kern/kern_shutdown.c Add delta 1.163.2.3 2004.11.29.19.11.36 njl If this fix does, in fact, address this problem, can I expect to see it in an official patch to 5.3-RELEASE? This is the only issue keeping us from upgrading/deploying 5.3-RELEASE on all (twenty-two at last count) of our production servers. I can't get an agreement to deploy 5.3-STABLE from my management, so it's 5.3-RELEASE-px or wait until 5.4-RELEASE. I'd rather not wait. Thanks for any information that you can supply.Received on Tue Nov 30 2004 - 19:57:32 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:23 UTC