Re: Nasty non-recursive lockmgr panic on softdep only enabled UFS partition when filesystem full

From: Garrett Cooper <yanegomi_at_gmail.com>
Date: Tue, 3 May 2011 23:58:49 -0700
On Tue, May 3, 2011 at 11:42 PM, Garrett Cooper <yanegomi_at_gmail.com> wrote:
> On Tue, May 3, 2011 at 10:59 PM, Kirk McKusick <mckusick_at_mckusick.com> wrote:
>>> Date: Tue, 3 May 2011 22:40:26 -0700
>>> Subject: Nasty non-recursive lockmgr panic on softdep only enabled UFS
>>>  partition when filesystem full
>>> From: Garrett Cooper <yanegomi_at_gmail.com>
>>> To: Jeff Roberson <jeff_at_freebsd.org>,
>>>         Marshall Kirk McKusick <mckusick_at_mckusick.com>
>>> Cc: FreeBSD Current <freebsd-current_at_freebsd.org>
>>>
>>> Hi Jeff and Dr. McKusick,
>>>     Ran into this panic when /usr ran out of space doing a make
>>> universe on amd64/r221219 (it took ~15 minutes for the panic to occur
>>> after the filesystem ran out of space -- wasn't quite sure what it was
>>> doing at the time):
>>>
>>> ...
>>>
>>>     Let me know what other commands you would like for me to run in kgdb.
>>> Thanks,
>>> -Garrett
>>
>> You did not indicate whether you are running an 8.X system or a 9-current
>> system. It would be helpful to know that.
>
> I've actually been running CURRENT for a few years now, but you're right --
> I didn't mention that part.
>
>> Jeff thinks that there may be a potential race in the locking code for
>> softdep_request_cleanup. If so, this patch for 9-current should fix it:
>>
>> Index: ffs_softdep.c
>> ===================================================================
>> --- ffs_softdep.c       (revision 221385)
>> +++ ffs_softdep.c       (working copy)
>> _at__at_ -11380,7 +11380,8 _at__at_
>>                                continue;
>>                        }
>>                        MNT_IUNLOCK(mp);
>> -                       if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK, curthread)) {
>> +                       if (vget(lvp, LK_EXCLUSIVE | LK_NOWAIT | LK_INTERLOCK,
>> +                           curthread)) {
>>                                MNT_ILOCK(mp);
>>                                continue;
>>                        }
>>
>> If you are running an 8.X system, hopefully you will be able to apply it.
>
>    I've applied it, rebuilt and installed the kernel, and trying to
> repro the case again. Will let you know how things go!

    Happened again with the change. It's really easy to repro:

1. Get a filesystem with UFS+SU
2. Execute something that does a large number of small writes to a partition.
3. 'dd if=/dev/zero of=FOO bs=10m' on the same partition

    The kernel will panic with the issue I discussed above.
Thanks!
-Garrett
Received on Wed May 04 2011 - 04:58:51 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:13 UTC