Am Fri, 13 Nov 2009 15:55:42 +0200 schrieb Andriy Gapon <avg_at_icyb.net.ua>: > on 13/11/2009 15:48 Kai Gallasch said the following: > > Am Fri, 13 Nov 2009 10:08:45 +0200 > > schrieb Andriy Gapon <avg_at_icyb.net.ua>: > >> Kai, > >> I have a hunch, could you please try the following _sledgehammer_ > >> patch (only kernel build/install is needed): > >> diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c > >> index 44b71f3..a456609 100644 > >> --- a/sys/amd64/amd64/pmap.c > >> +++ b/sys/amd64/amd64/pmap.c > >> _at__at_ -2981,6 +2981,7 _at__at_ setpte: > >> * Map the superpage. > >> */ > >> pde_store(pde, PG_PS | newpde); > >> + pmap_invalidate_all(pmap); > >> > >> pmap_pde_promotions++; > >> CTR2(KTR_PMAP, "pmap_promote_pde: success for va %#lx" > >> > >> This will slow down an act of promotion to a superpage, but should > >> not have any visible impact on overall performance. > > > > Andriy, > > > > I tried the patch with c > > hw.mca.enabled="1" , rebuilt the kernel (although normally I never > > build kernels on Friday 13th :-) and ran buildworld -j8 for five > > times in a row. No sign of a machine check exception, no reboot. > > I think that this is good news. > This is not a fix, but the fact that it helps should help us find a > proper solution. Hi. The patch did help for surviving a makeworld. But now I have another machine check exception with this server. It happened with your patch active, and vm.pmap.pg_ps_enabled="1". I copied data from a remote server by NFS mount to the instable server. Destination was a local ZFS filesystem. ---------------- sonnenkraft:~ # MCA: CPU 7 UNCOR PCC OVER DTLB L1 error MCA: Address 0xff800d860000 Fatal trap 28: machine check trap while in kernel mode cpuid = 7; apic id = 07 instruction pointer = 0x20:0xffffffff80e5f0b2 stack pointer = 0x28:0xffffff8241f8d7d0 frame pointer = 0x28:0xffffff8241f8da40 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, IOPL = 0 current process = 0 (spa_zio_1) [thread pid 0 tid 100193 ] Stopped at lzjb_compress+0x162: leal 0x1(%rdx),%edi db> bt Tracing pid 0 tid 100193 td 0xffffff000732aab0 lzjb_compress() at lzjb_compress+0x162 zio_compress_data() at zio_compress_data+0xbe zio_write_bp_init() at zio_write_bp_init+0xc2 zio_execute() at zio_execute+0x77 zio_ready() at zio_ready+0x124 zio_execute() at zio_execute+0x77 taskq_run() at taskq_run+0x13 taskqueue_run() at taskqueue_run+0x91 taskqueue_thread_loop() at taskqueue_thread_loop+0x3f fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff8241f8dd30, rbp = 0 --- ---------------- After this I again tried copying to local zfs through nfs - and again an exception. When setting vm.pmap.pg_ps_enabled="0" in loader.conf and rebooting the server survives the nfs copying and stays stable. --Kai.Received on Sat Nov 14 2009 - 00:21:25 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:58 UTC