Re: Non-responsive 8.0-RC1

From: Peter Jeremy <peterjeremy_at_acm.org>
Date: Mon, 30 Nov 2009 19:13:30 +1100
On 2009-Nov-29 08:56:55 +0100, Thomas Backman <serenity_at_exscape.org> wrote:
>
>On Nov 28, 2009, at 10:22 PM, Peter Jeremy wrote:
>
>> My main server is running 8.0/amd64 from between RC1 and RC2 and I've
>> recently had a couple of long-duration hangs on it during which time
>> processes doing I/O will stop responding.

I forgot to mention that I checked SMART state on the disks and also
did a 'zpool scrub' after the first occurrence - no problems showed up.

It actually "hung" again just after I sent the original mail.  This
time I managed to get console access and could check the kernel state.
This showed that a number of processes were blocked on ZFS locks.
The most commonly reported state was 'tx->tx_quiesce_done_cv)'.

It had been up for about 30 days before I noticed any problems and
seems to have been getting more obvious so it is also possible that
it's related to uptime - either a resource leak somewhere (though
there was nothing obvious) or memory fragmentation.

>Hmm, I know there was some fix to the scheduler re: thread priority,
>and it wouldn't surprise me if it was after your revision.

After looking around in the kernel, I'm now confident that it's not
a priority-inversion issue as the BOINC processes all appeared to be
running normally and not holding locks.

>My advice would be to upgrade to -RELEASE if possible. If not, at
>least check whether your build should be affected.

I have updated to a recent 8-stable and will see what happens.

-- 
Peter Jeremy

Received on Mon Nov 30 2009 - 07:13:45 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:58 UTC