I am currently running a stress test where I have about 30 postgres processes running on a dual Xeon with an adaptec raid controller. I am trying to reproduce some kernel lockups, but in the process keep getting into a state where no more io activity occurs, and all the postgres processes seem to be stuck in a sleep for a mutex (not making any progress). Some of the time, ufs_fsck is running because of an improper shutdown. The code is based on CURRENT from a couple of weeks ago. After enabling witnes, the following messages appear: Jun 19 18:00:51 TPC-D7-23 lock order reversal Jun 19 18:00:51 TPC-D7-23 1st 0xcab85294 vm object (vm object) _at_ /.amd_mnt/gnagelhout-pc3.sandvine.com/host/gerrit_bsd_5 _main/fw-bsd/src/sys/vm/swap_pager.c:1313 Jun 19 18:00:51 TPC-D7-23 2nd 0xc0780ba0 swap_pager swhash (swap_pager swhash) _at_ /.amd_mnt/gnagelhout-pc3.sandvine.com/h ost/gerrit_bsd_5_main/fw-bsd/src/sys/vm/swap_pager.c:1799 Jun 19 18:00:51 TPC-D7-23 3rd 0xca966108 vm object (vm object) _at_ /.amd_mnt/gnagelhout-pc3.sandvine.com/host/gerrit_bsd_5 _main/fw-bsd/src/sys/vm/uma_core.c:886 Jun 19 18:00:51 TPC-D7-23 Stack backtrace: Jun 19 18:00:51 TPC-D7-23 backtrace(c06de7a0,ca966108,c06ef9dd,c06ef9dd,c06f05b8) at backtrace+0x17 Jun 19 18:00:51 TPC-D7-23 witness_checkorder(ca966108,9,c06f05b8,376,ca924e00) at witness_checkorder+0x5f3 Jun 19 18:00:51 TPC-D7-23 _mtx_lock_flags(ca966108,0,c06f05b8,376,ca924e14) at _mtx_lock_flags+0x32 Jun 19 18:00:51 TPC-D7-23 obj_alloc(ca924e00,1000,e6897a1b,101,e6897a30) at obj_alloc+0x3f Jun 19 18:00:51 TPC-D7-23 slab_zalloc(ca924e00,1,ca924e14,8,c06f05b8) at slab_zalloc+0xb3 Jun 19 18:00:51 TPC-D7-23 uma_zone_slab(ca924e00,1,c06f05b8,68f,ca924eb0) at uma_zone_slab+0xda Jun 19 18:00:51 TPC-D7-23 uma_zalloc_internal(ca924e00,0,1,5c4,1) at uma_zalloc_internal+0x3e Jun 19 18:00:51 TPC-D7-23 uma_zalloc_arg(ca924e00,0,1,707,2) at uma_zalloc_arg+0x283 Jun 19 18:00:51 TPC-D7-23 swp_pager_meta_build(cab85294,5,0,2,0) at swp_pager_meta_build+0x12e Jun 19 18:00:51 TPC-D7-23 swap_pager_putpages(cab85294,e6897be0,1,0,e6897b50) at swap_pager_putpages+0x306 Jun 19 18:00:51 TPC-D7-23 default_pager_putpages(cab85294,e6897be0,1,0,e6897b50) at default_pager_putpages+0x2e Jun 19 18:00:51 TPC-D7-23 vm_pageout_flush(e6897be0,1,0,116,c073bda0) at vm_pageout_flush+0xdb Jun 19 18:00:51 TPC-D7-23 vm_pageout_clean(c436cb30,0,c06f03a0,33b,0) at vm_pageout_clean+0x2a3 Jun 19 18:00:51 TPC-D7-23 vm_pageout_scan(0,0,c06f03a0,5b7,30d4) at vm_pageout_scan+0x5d5 Jun 19 18:00:51 TPC-D7-23 vm_pageout(0,e6897d48,c06d9172,328,0) at vm_pageout+0x31d Jun 19 18:00:51 TPC-D7-23 fork_exit(c064ad69,0,e6897d48) at fork_exit+0x77 Jun 19 18:00:51 TPC-D7-23 fork_trampoline() at fork_trampoline+0x8 Jun 19 18:00:51 TPC-D7-23 --- trap 0x1, eip = 0, esp = 0xe6897d7c, ebp = 0 --- What else can I do to further debug this problem? A second problem I have noticed (with similar symptoms, ie no more IO, everything is blocked), all of my postgres processes are in the wddrain state. The code that is supposed to wake them up (runningbufwakeup) still gets called on occassion, but runningbufspace never becomes greater than lorunningspace, and thus will not call wakeup. I don't know if this is due to a slow leak (of runningbufspace), or some deadlock condition. Any ideas? Thanks, Gerrit NagelhoutReceived on Sat Jun 19 2004 - 20:27:23 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:58 UTC