Re: Postgresql locks up server - no response at all

From: Scott Long <scottl_at_freebsd.org>
Date: Wed, 04 Aug 2004 15:27:52 -0600
Sven Willenberger wrote:

> On Wed, 2004-08-04 at 13:49 -0700, Kevin Oberman wrote:
> 
>>>Date: Wed, 4 Aug 2004 13:34:56 -0700
>>>From: Jeremy Chadwick <freebsd_at_jdc.parodius.com>
>>>Sender: owner-freebsd-current_at_freebsd.org
>>>
>>>I've seen this with our SuperMicro SuperServer 5013C-T, running mysqld.
>>>Please note that the server is "heavily loaded" (note the quotes); usually
>>>a load of around 0.50 to 1.00 at all times, with mysqld being the top
>>>process.  Server runs all latest -CURRENT builds.
>>>
>>>Many people over in freebsd-threads mentioned this problem, and recommended
>>>all sorts-of different workarounds.  I tried every one available to me,
>>>except mucking with PREEMPTION (as I did not feel comfortable tinkering
>>>with a random .h file on the box; seemed to be a kernel-related thing,
>>>so I'd rather have just an "options" line for it -- I'm conditionally
>>>lazy).
>>
>>Please note that PREEMPTION is now NOT enabled in CURRENT. scottl
>>changed that a day or two ago because of all of these lock-ups. He and
>>Julian are listed as working to isolate the problem. Scott believes it's
>>in the scheduler. It's not specific to either ULE or 4BSD.
>>
>>So cvsup, rebuild the kernel and you should be fine.At least for a while.
> 
> 
> Based on this and Jeremy C.'s response it would appear that I should
> either try to upgrade my 5.2.1-P8 system to -CURRENT (which is scary
> because of the vinum array - root is not mounted on a vinum device, but
> the data directory is - will gvinum simply read this correctly? it is a
> stripe+mirror array of 4 drives) or start from scratch and go back to
> 4.10 (STABLE) for a while. I am assuming that the lockups I am seeing
> were exacerbated by the PREEMPTION episodes of the past couple weeks? If
> I choose the upgrade to -CURRENT, are there any caveats or
> recommendations? (besides reading "/usr/src/UPDATING" which I do
> religiously anyway)
> 
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"

I'm a bit nervous with asking you to upgrade to -current.  PREEMPTION is
practically disabled in 5.2.1 so upgrading has a low chance of fixing
the problem except maybe by sheer luck.  The best action would be to
get a crashdump.  If your system has an NMI button, then there are some
trivial patches that will assist with this.  If not, then you might want
to look at backporting the ichwd watchdog driver and letting that do a
chip-assisted NMI.

In any case, finding out exactly what each CPU is doing at the time of
the lockup is going to be vital.  The lockups that I've been able to
reproduce happen when a TAILQ in the scheduler gets corrupted and
resulting in one CPU spinning on the list forever with the scheduler
lock held.  All other cpus then quickly grind to a halt while they wait
for the sched lock to become free, which it never does.

Scott
Received on Wed Aug 04 2004 - 19:29:15 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:04 UTC