Re: Postgresql locks up server - no response at all

From: Scott Long <scottl_at_freebsd.org> Date: Wed, 04 Aug 2004 17:36:37 -0600 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:04 UTC

Sven Willenberger wrote:

> On Wed, 2004-08-04 at 15:27 -0600, Scott Long wrote:
> 
>>Sven Willenberger wrote:
>>
>>
>>>On Wed, 2004-08-04 at 13:49 -0700, Kevin Oberman wrote:
>>>
>>>
>>>>>Date: Wed, 4 Aug 2004 13:34:56 -0700
>>>>>From: Jeremy Chadwick <freebsd_at_jdc.parodius.com>
>>>>>Sender: owner-freebsd-current_at_freebsd.org
>>>>>
>>>>>I've seen this with our SuperMicro SuperServer 5013C-T, running mysqld.
>>>>>Please note that the server is "heavily loaded" (note the quotes); usually
>>>>>a load of around 0.50 to 1.00 at all times, with mysqld being the top
>>>>>process.  Server runs all latest -CURRENT builds.
>>>>>
>>>>>Many people over in freebsd-threads mentioned this problem, and recommended
>>>>>all sorts-of different workarounds.  I tried every one available to me,
>>>>>except mucking with PREEMPTION (as I did not feel comfortable tinkering
>>>>>with a random .h file on the box; seemed to be a kernel-related thing,
>>>>>so I'd rather have just an "options" line for it -- I'm conditionally
>>>>>lazy).
>>>>
>>>>Please note that PREEMPTION is now NOT enabled in CURRENT. scottl
>>>>changed that a day or two ago because of all of these lock-ups. He and
>>>>Julian are listed as working to isolate the problem. Scott believes it's
>>>>in the scheduler. It's not specific to either ULE or 4BSD.
>>>>
>>>>So cvsup, rebuild the kernel and you should be fine.At least for a while.
>>>
>>>
>>>Based on this and Jeremy C.'s response it would appear that I should
>>>either try to upgrade my 5.2.1-P8 system to -CURRENT (which is scary
>>>because of the vinum array - root is not mounted on a vinum device, but
>>>the data directory is - will gvinum simply read this correctly? it is a
>>>stripe+mirror array of 4 drives) or start from scratch and go back to
>>>4.10 (STABLE) for a while. I am assuming that the lockups I am seeing
>>>were exacerbated by the PREEMPTION episodes of the past couple weeks? If
>>>I choose the upgrade to -CURRENT, are there any caveats or
>>>recommendations? (besides reading "/usr/src/UPDATING" which I do
>>>religiously anyway)
>>>
>>>_______________________________________________
>>>freebsd-current_at_freebsd.org mailing list
>>>http://lists.freebsd.org/mailman/listinfo/freebsd-current
>>>To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>>
>>I'm a bit nervous with asking you to upgrade to -current.  PREEMPTION is
>>practically disabled in 5.2.1 so upgrading has a low chance of fixing
>>the problem except maybe by sheer luck.  The best action would be to
>>get a crashdump.  If your system has an NMI button, then there are some
>>trivial patches that will assist with this.  If not, then you might want
>>to look at backporting the ichwd watchdog driver and letting that do a
>>chip-assisted NMI.
>>
>>In any case, finding out exactly what each CPU is doing at the time of
>>the lockup is going to be vital.  The lockups that I've been able to
>>reproduce happen when a TAILQ in the scheduler gets corrupted and
>>resulting in one CPU spinning on the list forever with the scheduler
>>lock held.  All other cpus then quickly grind to a halt while they wait
>>for the sched lock to become free, which it never does.
>>
> 
> 
> The case unfortunately does not have a button (although the mobo does
> have an NMI header/jumper). Backporting the watchdog driver sounds
> doable; other than downloading the sys/dev/ichwd directory from a
> repository and adding "options ichwd" to my kernel config file, what
> else would be needed? I am willing to try to get at least one crashdump
> before I have to go back to a -STABLE setup or try something so I can
> get some uptime on this box.
> 

I believe that the ichwd driver depends on the watchdog infrastructure 
driver that was added back in the early spring.  I'm not 100% sure,
though.

Scott