Re: Postgresql locks up server - no response at all

From: Jeremy Chadwick <freebsd_at_jdc.parodius.com>
Date: Wed, 4 Aug 2004 13:34:56 -0700
I've seen this with our SuperMicro SuperServer 5013C-T, running mysqld.
Please note that the server is "heavily loaded" (note the quotes); usually
a load of around 0.50 to 1.00 at all times, with mysqld being the top
process.  Server runs all latest -CURRENT builds.

Many people over in freebsd-threads mentioned this problem, and recommended
all sorts-of different workarounds.  I tried every one available to me,
except mucking with PREEMPTION (as I did not feel comfortable tinkering
with a random .h file on the box; seemed to be a kernel-related thing,
so I'd rather have just an "options" line for it -- I'm conditionally
lazy).

The locks are exactly as you describe: random, hard-locks.  No KDB/DDB/GDB.
Just hard-locks with nothing in logs anywhere.

There's been (very recent) discussion here about lock-up problems seeming
load-related.  This is starting to sound very probable for a lot of reasons.

Here's a list of all the combinations of things I've tried to *no avail*.
The solution for us was to move mysqld to a 4.x machine.  Since then,
the -CURRENT box has managed to stay up for 3.5 days without any trouble:

=====
SuperMicro SuperServer 5013C-T
P4, 2.6GHz (for HTT settings, see below)
1GB ECC DDR400

For many months this machine worked fine under heavy load, SMP enabled, ACPI enabled,
APIC enabled.  Sometime in early-to-mid July things became unstable; I update my
kernel/world every 1-2 weeks.  The only other difference between "then and now" is
that the box runs MySQL (mysqld) 4.0.20; mysqld is not very heavily loaded (at least
in comparison to some other posters' systems I've seen...)

System can usually stay up about 48-72 hours before dying.

Initial configuration
* KERNEL: SCHED_ULE
* KERNEL: Disabled INVARIANT* and WITNESS*
* KERNEL: SMP enabled, APIC enabled
* BIOS: HTT enabled, APIC enabled, ACPI enabled
* /etc/make.conf has CPUTYPE=p4  (seems to be required for mysqld to work, else sig11)

Now the problems begin.  Here are my attempted changes...

* KERNEL: SCHED_4BSD --> SCHED_ULE
  KERNEL: Enabled KDB and DDB
  !! Random locks.

* KERNEL: Enabled INVARIANT* and WITNESS*
  !! Random locks.

* LOADER: Temporary ACPI disable (via loader(8) only; BIOS still has ACPI enabled).
  Kernel panic:

pci0: <PCI bus> on pcib0
panic: Multiple entries for PCI IRQ 18
cpuid = 0;
KDB: enter: panic
[thread 0]
Stopped at      kdb_enter+0x30: movl    %ebp,%esp

* BIOS: MPS 1.4 --> 1.1
  No idea if this worked, because we did the following after reading freebsd-threads:

* BIOS: Disabled HTT
  BIOS: MPS 1.1 --> 1.4
  KERNEL: SCHED_ULE --> SCHED_4BSD
  KERNEL: Disabled INVARIANT* and WITNESS*
  !! Random locks.

Thu Jul 29 04:16 PDT
* BIOS: Disabled APIC
  KERNEL: Disabled SMP, disabled APIC
  KERNEL: Enabled INVARIANT* and WITNESS*
  NOTE: Because of the latest gcc 3.4 import, I was forced to rebuild world too.
  NOTE: Prior to now, world was build WITHOUT CPUTYPE=p4.  If this matters at all...
  !! Random locks.

Sat Jul 31 13:08 PDT
* MYSQL: Recompiled 4.0.20 with WITH_PROC_SCOPE_PTH=yes.
  MYSQL: The 4.0.20 rebuild obviously now included CPUTYPE=p4.
  !! Random locks.

Sun Aug  1 03:01:09 PDT 2004
* Ended up moving mysql server portion to a 4.x box, in attempt to
  see if the 5.x box still hard-locks without mysqld.

Wed Aug  4 13:28:35 PDT 2004
* -CURRENT box is still alive and well.
=====


Since our situation has shown that even a pure single CPU (i.e. no HTT
and no SMP in the kernel) has exhibited lock-ups, as mentioned, I'm
starting to think high load causes it.

-- 
| Jeremy Chadwick                                 jdc at parodius.com |
| Parodius Networking                        http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, USA |
| Making life hard for others since 1977.                             |

On Wed, Aug 04, 2004 at 03:58:53PM -0400, Sven Willenberger wrote:
> FreeBSD 5.2.1-P8 running on dual Xeon supermicro system with vinum data
> drive and em network interfaces. I have been having a problem with the
> system simply locking up every couple days. No response from the
> keyboard, network, nothing. As if it is in some state of IRQ locking. I
> see nothing in the messages, even with DDB and DDB_UNATTENDED enabled in
> kernel. The system runs 4GB of ram with the following modifications to
> kernel:
> 
> cpu             I486_CPU
> cpu             I586_CPU
> cpu             I686_CPU
> <snip>
> options         SHMMAXPGS=65536         # ********************
> options         SEMMNI=40               # added for posgresql
> options         SEMMNS=240              # allows for around
> options         SEMUME=40               # 180 simultaneous connections
> options         SEMMNU=120              # ********************
> <snip>
> # Debugging for use in -current
> options         DDB                     #Enable the kernel debugger
> options         DDB_UNATTENDED          #Don't panic on DDB but log it
> #options        INVARIANTS              #Enable calls of extra sanity
> checking
> options         INVARIANT_SUPPORT       #Extra sanity checks of internal
> #options        WITNESS                 #Enable checks to detect dead ..
> #options        WITNESS_SKIPSPIN        #Don't run witness on spinlocks 
> # Deal with kmem issues
> options                 VM_KMEM_SIZE_SCALE="4"
> options                 VM_KMEM_SIZE_MAX="(512*1024*1024)"
> options                 KVA_PAGES=512
> 
> 
> /boot/loader.conf:
> vinum_load="YES"
> vinum.autostart="YES"
> #kern.maxdsiz="1073741824"
> #kern.dfldsiz="1073741824"
> 
> I had experimented in loader.conf with the dsiz settings to no avail,
> still get lockups. Got lockups with and without the DDB settings. It
> would be helpful if I could see some type of error being generated, but
> nothing; the attached terminal has utterly no messages beyond normal
> system messages, everything just stops responding.
> 
> After the last lockup and reboot, I sysctl machdep.hlt_logical_cpus=1 to
> see if that had any effect. Any other recommendations? adaptive_mutexes?
> Any ideas on how to actually find out what is happening?
> 
> Sven
> 
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
Received on Wed Aug 04 2004 - 18:34:57 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:04 UTC