2010/5/12 Jeff Roberson <jroberson_at_jroberson.net>: > On Wed, 12 May 2010, Ulrich Sp?rlein wrote: > >> On Mon, 10.05.2010 at 22:53:32 +0200, Attilio Rao wrote: >>> >>> 2010/5/10 Peter Jeremy <peterjeremy_at_acm.org>: >>>> >>>> On 2010-May-08 12:20:05 +0200, Ulrich Sp?rlein <uqs_at_spoerlein.net> >>>> wrote: >>>>> >>>>> This LOR also is not yet listed on the LOR page, so I guess it's rather >>>>> new. I do use SUJ. >>>>> >>>>> lock order reversal: >>>>> 1st 0xc48388d8 ufs (ufs) _at_ /usr/src/sys/kern/vfs_lookup.c:502 >>>>> 2nd 0xec0fe304 bufwait (bufwait) _at_ >>>>> /usr/src/sys/ufs/ffs/ffs_softdep.c:11363 >>>>> 3rd 0xc49e56b8 ufs (ufs) _at_ /usr/src/sys/kern/vfs_subr.c:2091 >>>> >>>> I'm seeing exactly the same LOR (and subsequent deadlock) on a recent >>>> -current without SUJ. >>> >>> I think this LOR was reported since a long time. >>> The deadlock may be new and someway related to the vm_page_lock work >>> (if not SUJ). >> >> I was not able to reproduce this with a kernel prior to SUJ, a kernel >> just after SUJ went it shows this "deadlock" or infinite loop ... >> >> Now it might be that the SUJ kernel only increases the pressure so it >> happens during a systems uptime. It does not seem directly related to >> actually using SUJ on a volume, as I could reproduce it with SU only, >> too. >> >> I will try to get a hang not involving GELI and also re-do my tests when >> the volumes have neither SUJ nor SU enabled, which led to 10-20s "hangs" >> of the system IIRC. It seems SU/SUJ then only prolongs these hangs ad >> infinitum. > > I think Peter Holm also saw this once while we were testing SUJ and > reproduced ~30 second hangs with stock sources. At this point we need to > brainstorm ideas for adding debugging instrumentation and come up with the > quickest possible repro. > > It would probably be good to add some KTR tracing and log that when it > wedges. The core I looked at was hung in bufwait. Is there any cpu > activity or io activity when things hang? You'll prboably have to keep > iostat/vmstat in memory to find out so they don't try to fault in pages once > things are hung. I think I also have some reports about deadlock on unmount -f (not specific to UFS) that seems to me still the same buffer cache async deadlock. I will forward you the traces in a separate e-mail (Peter got to reproduce it with KTR on). Attilio -- Peace can only be achieved by understanding - A. EinsteinReceived on Wed May 12 2010 - 18:55:05 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:03 UTC