Re: ZFS leaking vnodes (sort of)

From: Simon Dircks <enderbsd_at_gmail.com>
Date: Wed, 11 Jul 2007 20:24:41 -0400
On 7/9/07, Huang wen hui <huang_at_gddsn.org.cn> wrote:
>
> Pawel Jakub Dawidek $B<LF;(B:
> > On Sat, Jul 07, 2007 at 02:26:17PM +0100, Doug Rabson wrote:
> >
> >> I've been testing ZFS recently and I noticed some performance issues
> >> while doing large-scale port builds on a ZFS mounted /usr/ports tree.
> >> Eventually I realised that virtually nothing ever ended up on the vnode
> >> free list. This meant that when the system reached its maximum vnode
> >> limit, it had to resort to reclaiming vnodes from the various
> >> filesystem's active vnode lists (via vlrureclaim). Since those lists
> >> are not sorted in LRU order, this led to pessimal cache performance
> >> after the system got into that state.
> >>
> >> I looked a bit closer at the ZFS code and poked around with DDB and I
> >> think the problem was caused by a couple of extraneous calls to vhold
> >> when creating a new ZFS vnode. On FreeBSD, getnewvnode returns a vnode
> >> which is already held (not on the free list) so there is no need to
> >> call vhold again.
> >>
> >
> > Whoa! Nice catch... The patch works here - I did some pretty heavy
> > tests, so please commit it ASAP.
> >
> > I also wonder if this can help with some of those 'kmem_map too small'
> > panics. I was observing that ARC cannot reclaim memory and this may be
> > because all vnodes and thus associated data are beeing held.
> >
> > To ZFS users having problems with performance and/or stability of ZFS:
> > Can you test the patch and see if it helps?
> >
> my T60p notebook, -CURRENT amd64:
> buildworld time before patch: 58xx second.
> buildworld time after path: 28xx second.
>
> Thanks!



With this patch i am still able to reproduce my ZFS crash.

controllera# uname -a
FreeBSD controllera.storage.ksdhost.com 7.0-CURRENT FreeBSD 7.0-CURRENT #0:
Thu Jul 12 02:28:52 UTC 2007
graff_at_controllera.storage.ksdhost.com:/usr/obj/usr/src/sys/CONTROLLERA
amd64


panic: ZFS: bad checksum (read on <unknown> off 0: zio 0xffffff001d729810
[LO SP
A space map] 1000L/800P DVA[0]=<0:1600421800:800> DVA[1]=<0:2c000f7000:800>
DVA[
2]=<0:4200013800:800> fletcher4 lzjb LE contiguous birth=566 fill=1
chsum=5d3276
7b98:635ff7022f8b:4251
cpuid = 0
KDB: enter: panic
[thread pid 802 tid 100066 ]
stopped at kdb_enter+0x31: leave
Received on Wed Jul 11 2007 - 22:53:27 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:14 UTC