Re: Crash in base/head in abd_put() after r320156

From: Trond Endrestøl <Trond.Endrestol_at_fagskolen.gjovik.no>
Date: Thu, 22 Jun 2017 10:36:46 +0200 (CEST)
On Wed, 21 Jun 2017 11:18+0300, Andriy Gapon wrote:

> On 21/06/2017 00:45, Trond Endrestøl wrote:
> > On Tue, 20 Jun 2017 17:31-0400, Allan Jude wrote:
> > 
> >> On 2017-06-20 17:27, Trond Endrestøl wrote:
> >>> Has anyone else seen a crash in base/head in abd_put() after r320156?
> >>>
> >>> One of my experimental VMs at home crashed spectacularly after 
> >>> upgrading to r320156. I even wiped my /usr/obj, recompiled everything 
> >>> and got the same result. Everything's back to normal when I boot 
> >>> r320146.
> >>>
> >>> Here's the backtrace:
> >>>
> >>> Fatal trap 12: page fault while in kernel mode
> >>> cpuid = 3; apic id = 03
> >>>
> >>> fault virtual address	= 0x8
> >>>
> >>> Fatal trap 12: page fault while in kernel mode
> >>>
> >>> cpuid = 2; 
> >>> Fatal trap 12: page fault while in kernel mode
> >>> apic id = 02
> >>> fault virtual address	= 0x8
> >>> cpuid = 0; apic id = 00
> >>> fault virtual address	= 0x8
> >>> fault code		= supervisor read data, page not present
> >>> fault code		= supervisor read data, page not present
> >>> instruction pointer	= 0x20:0xffffffff803260fa
> >>> stack pointer	        = 0x28:0xfffffe01b0231860
> >>> frame pointer	        = 0x28:0xfffffe01b0231870
> >>> code segment		= base 0x0, limit 0xfffff, type 0x1b
> >>>
> >>> 			= DPL 0, pres 1, long 1, def32 0, gran 1
> >>>
> >>> Fatal trap 12: page fault while in kernel mode
> >>> fault code		= supervisor read data, page not present
> >>> processor eflags	= interrupt enabled, resume, IOPL = 0
> >>> current process		= 0 (zio_free_issue_5_2)
> >>> trap number		= 12
> >>> instruction pointer	= 0x20:0xffffffff803260fa
> >>> stack pointer	        = 0x28:0xfffffe01b022c860
> >>> frame pointer	        = 0x28:0xfffffe01b022c870
> >>> panic: page fault
> >>> cpuid = 0
> >>> time = 4
> >>> KDB: stack backtrace:
> >>> db_trace_self_wrapper() at 0xffffffff8044f93b = db_trace_self_wrapper+0x2b/frame 0xfffffe01b0231440
> >>> vpanic() at 0xffffffff8067ec0c = vpanic+0x19c/frame 0xfffffe01b02314c0
> >>> panic() at 0xffffffff8067ea63 = panic+0x43/frame 0xfffffe01b0231520
> >>> trap_fatal() at 0xffffffff80983b32 = trap_fatal+0x322/frame 0xfffffe01b0231570
> >>> trap_pfault() at 0xffffffff80983b89 = trap_pfault+0x49/frame 0xfffffe01b02315d0
> >>> trap() at 0xffffffff809833c5 = trap+0x295/frame 0xfffffe01b0231790
> >>> calltrap() at 0xffffffff80968c21 = calltrap+0x8/frame 0xfffffe01b0231790
> >>> --- trap 0xc, rip = 0xffffffff803260fa, rsp = 0xfffffe01b0231860, rbp = 0xfffffe01b0231870 ---
> >>> abd_put() at 0xffffffff803260fa = abd_put+0xa/frame 0xfffffe01b0231870
> >>> vdev_raidz_map_free() at 0xffffffff803aa7c2 = vdev_raidz_map_free+0x82/frame 0xfffffe01b02318a0
> >>> zio_vdev_io_assess() at 0xffffffff803ecc04 = zio_vdev_io_assess+0x74/frame 0xfffffe01b02318e0
> >>> zio_execute() at 0xffffffff803e913c = zio_execute+0xac/frame 0xfffffe01b0231930
> >>> zio_vdev_io_start() at 0xffffffff803ec894 = zio_vdev_io_start+0x2b4/frame 0xfffffe01b0231990
> >>> zio_execute() at 0xffffffff803e913c = zio_execute+0xac/frame 0xfffffe01b02319e0
> >>> zio_nowait() at 0xffffffff803e8a8b = zio_nowait+0xcb/frame 0xfffffe01b0231a20
> >>> vdev_mirror_io_start() at 0xffffffff803a744c = vdev_mirror_io_start+0x35c/frame 0xfffffe01b0231a70
> >>> zio_vdev_io_start() at 0xffffffff803ec86c = zio_vdev_io_start+0x28c/frame 0xfffffe01b0231ad0
> >>> zio_execute() at 0xffffffff803e913c = zio_execute+0xac/frame 0xfffffe01b0231b20
> >>> taskqueue_run_locked() at 0xffffffff806d3d27 = taskqueue_run_locked+0x127/frame 0xfffffe01b0231b80
> >>> taskqueue_thread_loop() at 0xffffffff806d4ee8 = taskqueue_thread_loop+0xc8/frame 0xfffffe01b0231bb0
> >>> fork_exit() at 0xffffffff80640df5 = fork_exit+0x85/frame 0xfffffe01b0231bf0
> >>> fork_trampoline() at 0xffffffff8096915e = fork_trampoline+0xe/frame 0xfffffe01b0231bf0
> >>> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> >>> Uptime: 4s
> >>>
> >>
> >> This seems to be an unintended consequence of some code that was pulled
> >> in from upstream today.
> >>
> >> Try adding: vfs.zfs.trim.enabled=0
> >> to /boot/loader.conf
> >>
> >> (you can set it manually from the boot loader menu with the set command
> >> to get the system to boot)
> > 
> > That worked. Thanks.
> > 
> > BTW, the call to abd_put() was given a NULL pointer.
> > 
> 
> Could you please re-enable ZFS TRIM support and test r320186 or later?
> ZFS ABD is a rather large upstream change and our TRIM support is sprinkled over
> non-trivial amount of code as well.
> Thank you.

r320186 works without disabling zfs trim support. Tested on both on 
XenServer at work and VirtualBox at home.

-- 
+-------------------------------+------------------------------------+
| Vennlig hilsen,               | Best regards,                      |
| Trond Endrestøl,              | Trond Endrestøl,                   |
| IT-ansvarlig,                 | System administrator,              |
| Fagskolen Innlandet,          | Gjøvik Technical College, Norway,  |
| tlf. mob.   952 62 567,       | Cellular...: +47 952 62 567,       |
| sentralbord 61 14 54 00.      | Switchboard: +47 61 14 54 00.      |
+-------------------------------+------------------------------------+
Received on Thu Jun 22 2017 - 06:36:58 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:12 UTC