Re: Postgresql locks up server - no response at all

From: Sven Willenberger <sven_at_dmv.com>
Date: Fri, 06 Aug 2004 16:36:19 -0400
On Wed, 2004-08-04 at 17:36 -0600, Scott Long wrote:
> Sven Willenberger wrote:
> 

<snip>

> >>>
> >>>Based on this and Jeremy C.'s response it would appear that I should
> >>>either try to upgrade my 5.2.1-P8 system to -CURRENT (which is scary
> >>>because of the vinum array - root is not mounted on a vinum device, but
> >>>the data directory is - will gvinum simply read this correctly? it is a
> >>>stripe+mirror array of 4 drives) or start from scratch and go back to
> >>>4.10 (STABLE) for a while. I am assuming that the lockups I am seeing
> >>>were exacerbated by the PREEMPTION episodes of the past couple weeks? If
> >>>I choose the upgrade to -CURRENT, are there any caveats or
> >>>recommendations? (besides reading "/usr/src/UPDATING" which I do
> >>>religiously anyway)
> >>>
> >>>_______________________________________________
> >>>freebsd-current_at_freebsd.org mailing list
> >>>http://lists.freebsd.org/mailman/listinfo/freebsd-current
> >>>To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
> >>
> >>I'm a bit nervous with asking you to upgrade to -current.  PREEMPTION is
> >>practically disabled in 5.2.1 so upgrading has a low chance of fixing
> >>the problem except maybe by sheer luck.  The best action would be to
> >>get a crashdump.  If your system has an NMI button, then there are some
> >>trivial patches that will assist with this.  If not, then you might want
> >>to look at backporting the ichwd watchdog driver and letting that do a
> >>chip-assisted NMI.
> >>
> >>In any case, finding out exactly what each CPU is doing at the time of
> >>the lockup is going to be vital.  The lockups that I've been able to
> >>reproduce happen when a TAILQ in the scheduler gets corrupted and
> >>resulting in one CPU spinning on the list forever with the scheduler
> >>lock held.  All other cpus then quickly grind to a halt while they wait
> >>for the sched lock to become free, which it never does.
> >>
> > 
> > 
> > The case unfortunately does not have a button (although the mobo does
> > have an NMI header/jumper). Backporting the watchdog driver sounds
> > doable; other than downloading the sys/dev/ichwd directory from a
> > repository and adding "options ichwd" to my kernel config file, what
> > else would be needed? I am willing to try to get at least one crashdump
> > before I have to go back to a -STABLE setup or try something so I can
> > get some uptime on this box.
> > 
> 
> I believe that the ichwd driver depends on the watchdog infrastructure 
> driver that was added back in the early spring.  I'm not 100% sure,
> though.
> 
> Scott

The watchdog routines were incorporated into 5.2.1 as evidenced by the
NOTES for i386, however, apparently a lot has changed with those files
at the point that the ichwd driver was added. In essence, my kernel
config additions:

options         HW_WDOG
options         WATCHDOG

(note that in -CURRENT the software watchdog option is different)

added ichwd to the i386.files file and to the /usr/src/sys/modules/
Makefile.

buildkernel fails at building the ichwd module with:
syntax error in included file ichwd.h at eventhandler_tag ev_tag;

which then causes the undefined reference to ev_tag to cause the build
to fail.

for now I have sysctl machdep.hlt_logical_cpus=1 to see if this helps
any.
Received on Fri Aug 06 2004 - 18:38:11 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:05 UTC