I've seen this with our SuperMicro SuperServer 5013C-T, running mysqld. Please note that the server is "heavily loaded" (note the quotes); usually a load of around 0.50 to 1.00 at all times, with mysqld being the top process. Server runs all latest -CURRENT builds. Many people over in freebsd-threads mentioned this problem, and recommended all sorts-of different workarounds. I tried every one available to me, except mucking with PREEMPTION (as I did not feel comfortable tinkering with a random .h file on the box; seemed to be a kernel-related thing, so I'd rather have just an "options" line for it -- I'm conditionally lazy). The locks are exactly as you describe: random, hard-locks. No KDB/DDB/GDB. Just hard-locks with nothing in logs anywhere. There's been (very recent) discussion here about lock-up problems seeming load-related. This is starting to sound very probable for a lot of reasons. Here's a list of all the combinations of things I've tried to *no avail*. The solution for us was to move mysqld to a 4.x machine. Since then, the -CURRENT box has managed to stay up for 3.5 days without any trouble: ===== SuperMicro SuperServer 5013C-T P4, 2.6GHz (for HTT settings, see below) 1GB ECC DDR400 For many months this machine worked fine under heavy load, SMP enabled, ACPI enabled, APIC enabled. Sometime in early-to-mid July things became unstable; I update my kernel/world every 1-2 weeks. The only other difference between "then and now" is that the box runs MySQL (mysqld) 4.0.20; mysqld is not very heavily loaded (at least in comparison to some other posters' systems I've seen...) System can usually stay up about 48-72 hours before dying. Initial configuration * KERNEL: SCHED_ULE * KERNEL: Disabled INVARIANT* and WITNESS* * KERNEL: SMP enabled, APIC enabled * BIOS: HTT enabled, APIC enabled, ACPI enabled * /etc/make.conf has CPUTYPE=p4 (seems to be required for mysqld to work, else sig11) Now the problems begin. Here are my attempted changes... * KERNEL: SCHED_4BSD --> SCHED_ULE KERNEL: Enabled KDB and DDB !! Random locks. * KERNEL: Enabled INVARIANT* and WITNESS* !! Random locks. * LOADER: Temporary ACPI disable (via loader(8) only; BIOS still has ACPI enabled). Kernel panic: pci0: <PCI bus> on pcib0 panic: Multiple entries for PCI IRQ 18 cpuid = 0; KDB: enter: panic [thread 0] Stopped at kdb_enter+0x30: movl %ebp,%esp * BIOS: MPS 1.4 --> 1.1 No idea if this worked, because we did the following after reading freebsd-threads: * BIOS: Disabled HTT BIOS: MPS 1.1 --> 1.4 KERNEL: SCHED_ULE --> SCHED_4BSD KERNEL: Disabled INVARIANT* and WITNESS* !! Random locks. Thu Jul 29 04:16 PDT * BIOS: Disabled APIC KERNEL: Disabled SMP, disabled APIC KERNEL: Enabled INVARIANT* and WITNESS* NOTE: Because of the latest gcc 3.4 import, I was forced to rebuild world too. NOTE: Prior to now, world was build WITHOUT CPUTYPE=p4. If this matters at all... !! Random locks. Sat Jul 31 13:08 PDT * MYSQL: Recompiled 4.0.20 with WITH_PROC_SCOPE_PTH=yes. MYSQL: The 4.0.20 rebuild obviously now included CPUTYPE=p4. !! Random locks. Sun Aug 1 03:01:09 PDT 2004 * Ended up moving mysql server portion to a 4.x box, in attempt to see if the 5.x box still hard-locks without mysqld. Wed Aug 4 13:28:35 PDT 2004 * -CURRENT box is still alive and well. ===== Since our situation has shown that even a pure single CPU (i.e. no HTT and no SMP in the kernel) has exhibited lock-ups, as mentioned, I'm starting to think high load causes it. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. | On Wed, Aug 04, 2004 at 03:58:53PM -0400, Sven Willenberger wrote: > FreeBSD 5.2.1-P8 running on dual Xeon supermicro system with vinum data > drive and em network interfaces. I have been having a problem with the > system simply locking up every couple days. No response from the > keyboard, network, nothing. As if it is in some state of IRQ locking. I > see nothing in the messages, even with DDB and DDB_UNATTENDED enabled in > kernel. The system runs 4GB of ram with the following modifications to > kernel: > > cpu I486_CPU > cpu I586_CPU > cpu I686_CPU > <snip> > options SHMMAXPGS=65536 # ******************** > options SEMMNI=40 # added for posgresql > options SEMMNS=240 # allows for around > options SEMUME=40 # 180 simultaneous connections > options SEMMNU=120 # ******************** > <snip> > # Debugging for use in -current > options DDB #Enable the kernel debugger > options DDB_UNATTENDED #Don't panic on DDB but log it > #options INVARIANTS #Enable calls of extra sanity > checking > options INVARIANT_SUPPORT #Extra sanity checks of internal > #options WITNESS #Enable checks to detect dead .. > #options WITNESS_SKIPSPIN #Don't run witness on spinlocks > # Deal with kmem issues > options VM_KMEM_SIZE_SCALE="4" > options VM_KMEM_SIZE_MAX="(512*1024*1024)" > options KVA_PAGES=512 > > > /boot/loader.conf: > vinum_load="YES" > vinum.autostart="YES" > #kern.maxdsiz="1073741824" > #kern.dfldsiz="1073741824" > > I had experimented in loader.conf with the dsiz settings to no avail, > still get lockups. Got lockups with and without the DDB settings. It > would be helpful if I could see some type of error being generated, but > nothing; the attached terminal has utterly no messages beyond normal > system messages, everything just stops responding. > > After the last lockup and reboot, I sysctl machdep.hlt_logical_cpus=1 to > see if that had any effect. Any other recommendations? adaptive_mutexes? > Any ideas on how to actually find out what is happening? > > Sven > > _______________________________________________ > freebsd-current_at_freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"Received on Wed Aug 04 2004 - 18:34:57 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:04 UTC