-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03.01.2012 10:18, Kostik Belousov wrote: > On Tue, Jan 03, 2012 at 12:02:22AM -0800, Don Lewis wrote: >> On 2 Jan, Don Lewis wrote: >>> On 2 Jan, Don Lewis wrote: >>>> On 2 Jan, Florian Smeets wrote: >>> >>>>> This does not make a difference. I tried on 32K/4K >>>>> with/without journal and on 16K/2K all exhibit the same >>>>> problem. At some point during the cvs2svn conversion the >>>>> sycer starts to use 100% CPU. The whole process hangs at >>>>> that point sometimes for hours, from time to time it does >>>>> continue doing some work, but really really slow. It's >>>>> usually between revision 210000 and 220000, when the >>>>> resulting svn file gets bigger than about 11-12Gb. At that >>>>> point an ls in the target dir hangs in state ufs. >>>>> >>>>> I broke into ddb and ran all commands which i thought >>>>> could be useful. The output is at >>>>> http://tb.smeets.im/~flo/giant-ape_syncer.txt >>>> >>>> Tracing command syncer pid 9 tid 100183 td 0xfffffe00120e9000 >>>> cpustop_handler() at cpustop_handler+0x2b ipi_nmi_handler() >>>> at ipi_nmi_handler+0x50 trap() at trap+0x1a8 nmi_calltrap() >>>> at nmi_calltrap+0x8 --- trap 0x13, rip = 0xffffffff8082ba43, >>>> rsp = 0xffffff8000270fe0, rbp = 0xffffff88c97829a0 --- >>>> _mtx_assert() at _mtx_assert+0x13 pmap_remove_write() at >>>> pmap_remove_write+0x38 vm_object_page_remove_write() at >>>> vm_object_page_remove_write+0x1f vm_object_page_clean() at >>>> vm_object_page_clean+0x14d vfs_msync() at vfs_msync+0xf1 >>>> sync_fsync() at sync_fsync+0x12a sync_vnode() at >>>> sync_vnode+0x157 sched_sync() at sched_sync+0x1d1 >>>> fork_exit() at fork_exit+0x135 fork_trampoline() at >>>> fork_trampoline+0xe --- trap 0, rip = 0, rsp = >>>> 0xffffff88c9782d00, rbp = 0 --- >>>> >>>> I thinks this explains why the r228838 patch seems to help >>>> the problem. Instead of an application call to msync(), >>>> you're getting bitten by the syncer doing the equivalent. I >>>> don't know why the syncer is CPU bound, though. From my >>>> understanding of the patch it only optimizes the I/O. >>>> Without the patch, I would expect that the syncer would just >>>> spend a lot of time waiting on I/O. My guess is that this >>>> is actually a vm problem. There are nested loops in >>>> vm_object_page_clean() and vm_object_page_remove_write(), so >>>> you could be doing something that's causing lots of looping >>>> in that code. >>> >>> Does the machine recover if you suspend cvs2svn? I think what >>> is happening is that cvs2svn is continuing to dirty pages >>> while the syncer is trying to sync the file. From my limited >>> understanding of this code, it looks to me like every time >>> cvs2svn dirties a page, it will trigger a call to >>> vm_object_set_writeable_dirty(), which will increment >>> object->generation. Whenever vm_object_page_clean() detects a >>> change in the generation count, it restarts its scan of the >>> pages associated with the object. This is probably not >>> optimal ... >> >> Since the syncer is only trying to flush out pages that have >> been dirty for the last 30 seconds, I think that >> vm_object_set_writeable_dirty() should just make one pass >> through the object, ignoring generation, and then return when it >> is called from the syncer. That should keep >> vm_object_set_writeable_dirty() from looping over the object >> again and again if another process is actively dirtying the >> object. >> > This sounds very plausible. I think that there is no sense in > restarting the scan if it is requested in async mode at all. See > below. > > Would be thrilled if this finally solves the svn2cvs issues. > > commit 41aaafe5e3be5387949f303b8766da64ee4a521f Author: Kostik > Belousov <kostik_at_sirion> Date: Tue Jan 3 11:16:30 2012 +0200 > > Do not restart the scan in vm_object_page_clean() if requested > mode is async. > > Proposed by: truckman > > diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c index > 716916f..52fc08b 100644 --- a/sys/vm/vm_object.c +++ > b/sys/vm/vm_object.c _at__at_ -841,7 +841,8 _at__at_ rescan: if (p->valid == 0) > continue; if (vm_page_sleep_if_busy(p, TRUE, "vpcwai")) { - if > (object->generation != curgeneration) + if ((flags & OBJPC_SYNC) > != 0 && + object->generation != curgeneration) goto rescan; > np = vm_page_find_least(object, pi); continue; _at__at_ -851,7 +852,8 _at__at_ > rescan: > > n = vm_object_page_collect_flush(object, p, pagerflags, flags, > &clearobjflags); - if (object->generation != curgeneration) + if > ((flags & OBJPC_SYNC) != 0 && + object->generation != > curgeneration) goto rescan; > > /* Yes, the patch fixes the problem. The cvs2svn run completed this time. 9132.25 real 8387.05 user 403.86 sys I did not see any significant syncer activity in top -S anymore. Thanks a lot. Florian -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8C+KYACgkQapo8P8lCvwkc+QCeLY8+OkEQo1/wB3J2TyjfXyc0 b0IAn1OJo1XUlBYPZRoU5NFSO5dnNbne =IGEW -----END PGP SIGNATURE-----Received on Tue Jan 03 2012 - 11:46:33 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC