Re: Crash in base/head in abd_put() after r320156

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Wed, 21 Jun 2017 11:18:01 +0300
On 21/06/2017 00:45, Trond Endrestøl wrote:
> On Tue, 20 Jun 2017 17:31-0400, Allan Jude wrote:
> 
>> On 2017-06-20 17:27, Trond Endrestøl wrote:
>>> Has anyone else seen a crash in base/head in abd_put() after r320156?
>>>
>>> One of my experimental VMs at home crashed spectacularly after 
>>> upgrading to r320156. I even wiped my /usr/obj, recompiled everything 
>>> and got the same result. Everything's back to normal when I boot 
>>> r320146.
>>>
>>> Here's the backtrace:
>>>
>>> Fatal trap 12: page fault while in kernel mode
>>> cpuid = 3; apic id = 03
>>>
>>> fault virtual address	= 0x8
>>>
>>> Fatal trap 12: page fault while in kernel mode
>>>
>>> cpuid = 2; 
>>> Fatal trap 12: page fault while in kernel mode
>>> apic id = 02
>>> fault virtual address	= 0x8
>>> cpuid = 0; apic id = 00
>>> fault virtual address	= 0x8
>>> fault code		= supervisor read data, page not present
>>> fault code		= supervisor read data, page not present
>>> instruction pointer	= 0x20:0xffffffff803260fa
>>> stack pointer	        = 0x28:0xfffffe01b0231860
>>> frame pointer	        = 0x28:0xfffffe01b0231870
>>> code segment		= base 0x0, limit 0xfffff, type 0x1b
>>>
>>> 			= DPL 0, pres 1, long 1, def32 0, gran 1
>>>
>>> Fatal trap 12: page fault while in kernel mode
>>> fault code		= supervisor read data, page not present
>>> processor eflags	= interrupt enabled, resume, IOPL = 0
>>> current process		= 0 (zio_free_issue_5_2)
>>> trap number		= 12
>>> instruction pointer	= 0x20:0xffffffff803260fa
>>> stack pointer	        = 0x28:0xfffffe01b022c860
>>> frame pointer	        = 0x28:0xfffffe01b022c870
>>> panic: page fault
>>> cpuid = 0
>>> time = 4
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper() at 0xffffffff8044f93b = db_trace_self_wrapper+0x2b/frame 0xfffffe01b0231440
>>> vpanic() at 0xffffffff8067ec0c = vpanic+0x19c/frame 0xfffffe01b02314c0
>>> panic() at 0xffffffff8067ea63 = panic+0x43/frame 0xfffffe01b0231520
>>> trap_fatal() at 0xffffffff80983b32 = trap_fatal+0x322/frame 0xfffffe01b0231570
>>> trap_pfault() at 0xffffffff80983b89 = trap_pfault+0x49/frame 0xfffffe01b02315d0
>>> trap() at 0xffffffff809833c5 = trap+0x295/frame 0xfffffe01b0231790
>>> calltrap() at 0xffffffff80968c21 = calltrap+0x8/frame 0xfffffe01b0231790
>>> --- trap 0xc, rip = 0xffffffff803260fa, rsp = 0xfffffe01b0231860, rbp = 0xfffffe01b0231870 ---
>>> abd_put() at 0xffffffff803260fa = abd_put+0xa/frame 0xfffffe01b0231870
>>> vdev_raidz_map_free() at 0xffffffff803aa7c2 = vdev_raidz_map_free+0x82/frame 0xfffffe01b02318a0
>>> zio_vdev_io_assess() at 0xffffffff803ecc04 = zio_vdev_io_assess+0x74/frame 0xfffffe01b02318e0
>>> zio_execute() at 0xffffffff803e913c = zio_execute+0xac/frame 0xfffffe01b0231930
>>> zio_vdev_io_start() at 0xffffffff803ec894 = zio_vdev_io_start+0x2b4/frame 0xfffffe01b0231990
>>> zio_execute() at 0xffffffff803e913c = zio_execute+0xac/frame 0xfffffe01b02319e0
>>> zio_nowait() at 0xffffffff803e8a8b = zio_nowait+0xcb/frame 0xfffffe01b0231a20
>>> vdev_mirror_io_start() at 0xffffffff803a744c = vdev_mirror_io_start+0x35c/frame 0xfffffe01b0231a70
>>> zio_vdev_io_start() at 0xffffffff803ec86c = zio_vdev_io_start+0x28c/frame 0xfffffe01b0231ad0
>>> zio_execute() at 0xffffffff803e913c = zio_execute+0xac/frame 0xfffffe01b0231b20
>>> taskqueue_run_locked() at 0xffffffff806d3d27 = taskqueue_run_locked+0x127/frame 0xfffffe01b0231b80
>>> taskqueue_thread_loop() at 0xffffffff806d4ee8 = taskqueue_thread_loop+0xc8/frame 0xfffffe01b0231bb0
>>> fork_exit() at 0xffffffff80640df5 = fork_exit+0x85/frame 0xfffffe01b0231bf0
>>> fork_trampoline() at 0xffffffff8096915e = fork_trampoline+0xe/frame 0xfffffe01b0231bf0
>>> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>>> Uptime: 4s
>>>
>>
>> This seems to be an unintended consequence of some code that was pulled
>> in from upstream today.
>>
>> Try adding: vfs.zfs.trim.enabled=0
>> to /boot/loader.conf
>>
>> (you can set it manually from the boot loader menu with the set command
>> to get the system to boot)
> 
> That worked. Thanks.
> 
> BTW, the call to abd_put() was given a NULL pointer.
> 

Could you please re-enable ZFS TRIM support and test r320186 or later?
ZFS ABD is a rather large upstream change and our TRIM support is sprinkled over
non-trivial amount of code as well.
Thank you.

-- 
Andriy Gapon
Received on Wed Jun 21 2017 - 06:18:40 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:12 UTC