Re: Processes blocked on ufs or getblk

From: Andre Guibert de Bruet <andy_at_siliconlandmark.com>
Date: Mon, 26 Jan 2004 01:26:15 -0500 (EST)
On Thu, 15 Jan 2004, Andre Guibert de Bruet wrote:

> On Thu, 15 Jan 2004, Lachlan O'Dea wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> >
> > I found some discussion about this in December, but I don't think
> > anyone has been able to get to the bottom of it yet. The symptom is
> > that processes become permanently blocked in a state of ufs or getblk.
> > I can reproduce it with find at will:
> >
> > % ps axl | grep ufs
> >      0 13225 13215   1  -4  0  1300  804 ufs    D     ??    0:00.96 find
> > /var -xdev -type f ( -perm -u+x -or -perm -g+x -or -perm -o+
> >      0 28778 28765   0  -4  0  1300  804 ufs    D     ??    0:00.97 find
> > /var -xdev -type f ( -perm -u+x -or -perm -g+x -or -perm -o+
> >      0 33017 32933   2  -4  0  1304  788 ufs    D     p2-   0:10.69 find
> > / -name samba
> >
> > It has also happened several times in single user mode to makewhatis
> > running at the end of installworld.
> >
> > System details: 5.2-RC FreeBSD 5.2-RC #1: Fri Jan  9 04:45:51 EST 2004.
> > Dell PowerEdge 2500. All filesystems are on a single raid 5 volume
> > using the aac driver. The box has two CPUs, but I'm currently running
> > with kern.smp.disabled=1.
> >
> > % mount
> > /dev/aacd0s1a on / (ufs, local)
> > devfs on /dev (devfs, local)
> > /dev/aacd0s1e on /usr (ufs, local, with quotas, soft-updates)
> > /dev/aacd0s1d on /var (ufs, local, soft-updates)
> > procfs on /proc (procfs, local)
> > linprocfs on /usr/compat/linux/proc (linprocfs, local)
> >
> > I also have ACLs enabled on /usr, if that's at all relevant.
> >
> > The kernel has DDB and DEBUG_LOCKS. Please let me know if there's
> > anything I can do to help debug this.
> >
> > I don't know if this is related, but another problem is that when
> > shutting down, it always gives up on a bunch of buffers. I think I've
> > seen over 100, but usually it's 4-10 buffers.
>
> I'm seeing the same thing on my desktop machine. It usually occurs while
> scanning large directories and/or dealing with large collections of files
> rather quickly. I came across this bug while using gqview to go through my
> image collection and a second time while re-checking out my ports tree
> from local cvs. The programs appear to grab an exclusive lock and anything
> that tries to read or write to the directory (or get a directory listing)
> gets stuck in ufs state.
>
> My kernel config is rather simple, GENERIC without a lot of cruft except
> amr, ata, scsi, usb and pcm. I'll try to get the output of a ddb ps and a
> show lockedvnods.

I'm reviving this thread as I have more information that might help track
this problem down. The offending process in this case is gqview but it
could have been 'find /' or any other process running when there's high
system load (such as daylies).

>From the emails that I've gotten it appears that this bug affects users
that are using either ccd or hardware raid (amr driver in my case). I've
attached the output of a ddb ps and a 'show lockednods'.

Every time the getblk hang rears it's ugly head, I've seen
"amr0: bad slot x completed" (where x is an integer between 0 to 4)
printed on the serial console.

This makes me think that there's a failure mode or special state that
isn't being checked with the amr driver. Perusing the code shows that the
bad slot message is a result of a NULL busy command. I'm no storage driver
and my VFS knowledge is somewhat limited. Anyone out there want to have a
look at this? I'm willing to try out any patches on this system.

I'm currently running:
FreeBSD bling.home 5.2-CURRENT FreeBSD 5.2-CURRENT #1: Thu Jan 22 11:38:46 EST 2004     andy_at_bling.home:/usr/src/sys/i386/compile/BLING  i386

Full Kernel config file is up at:
http://bling.properkernel.com/BLING

I'll have a boot -v up shortly at:
http://bling.properkernel.com/boot-v.txt

Regards,

> Andre Guibert de Bruet | Enterprise Software Consultant >
> Silicon Landmark, LLC. | http://siliconlandmark.com/    >
Received on Sun Jan 25 2004 - 21:26:34 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:39 UTC