Re: zfs: using, then destroying a snapshot sometimes panics zfs

From: Stefan Bethke <stb_at_lassitu.de>
Date: Wed, 18 Feb 2009 07:55:02 +0100
Didn't get any responses on -fs, maybe someone here has seen similar  
behaviour?


Stefan

Am 15.02.2009 um 12:08 schrieb Stefan Bethke:

>
> Am 15.02.2009 um 11:39 schrieb Stefan Bethke:
>
>> Am 08.02.2009 um 14:37 schrieb Stefan Bethke:
>>
>>> Sorry I can't be more precise at the moment, but while creating a  
>>> script that mirrors some zfs filesystems to another machine, I've  
>>> now twice gotten weird behaviour and then a panic.
>>>
>>> The script iterates over a couple of zfs file systems:
>>> - creates a snapshot with zfs snapshot tank/foo_at_mirror
>>> - uses rsync to copy the contents of the snapshot with rsync /tank/ 
>>> foo/.zfs/snapshot/mirror/ dest:...
>>> - destroys the snapshot with zfs destroy tank/foo_at_mirror
>>>
>>> During testing the script, I twice got to a point where, after the  
>>> snapshot was created without an error message, rsync dropped out  
>>> with an error message similar to "invalid file handle" on /tank/ 
>>> foo/.zfs/snapshot.
>>>
>>> At that point, I could cd to /tank/foo/.zfs, but ls produced the  
>>> same error message.
>>>
>>> I then tried to unmount the snapshot with zfs umount, and got a  
>>> panic (which I also didn't manage to capture).
>>>
>>> Is this a generally known issue, or should I try to capture more  
>>> information when this happens again?
>>
>>
>> # cd /tank/foo/.zfs
>> # ls -l
>> ls: snapshot: Bad file descriptor
>> total 0
>> # cd snapshot
>> -su: cd: snapshot: Not a directory
>>
>> I currently have no snapshots:
>> # zfs list -t snapshot
>> no datasets available
>>
>> However, on a different file system, I can list and cd into snapshot:
>> # /tank/bar/.zfs
>> # ls -l
>> total 0
>> dr-xr-xr-x  2 root  wheel  2 Feb  8 00:43 snapshot/
>> # cd snapshot
>>
>> Trying to umount produces a panic:
>> # zfs umount /jail/foo
>>
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 1; apic id = 01
>> fault virtual address	= 0xa8
>> fault code		= supervisor write data, page not present
>> instruction pointer	= 0x8:0xffffffff802ee565
>> stack pointer	        = 0x10:0xfffffffea29c39e0
>> frame pointer	        = 0x10:0xfffffffea29c39f0
>> code segment		= base 0x0, limit 0xfffff, type 0x1b
>> 			= DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags	= interrupt enabled, resume, IOPL = 0
>> current process		= 51383 (zfs)
>> [thread pid 51383 tid 100298 ]
>> Stopped at      _sx_xlock+0x15: lock cmpxchgq   %rsi,0x18(%rdi)
>> db> bt
>> Tracing pid 51383 tid 100298 td 0xffffff00a598e720
>> _sx_xlock() at _sx_xlock+0x15
>> zfsctl_umount_snapshots() at zfsctl_umount_snapshots+0xa5
>> zfs_umount() at zfs_umount+0xdd
>> dounmount() at dounmount+0x2b4
>> unmount() at unmount+0x24b
>> syscall() at syscall+0x1a5
>> Xfast_syscall() at Xfast_syscall+0xab
>> --- syscall (22, FreeBSD ELF64, unmount), rip = 0x800f412fc, rsp =  
>> 0x7fffffffd1a8, rbp = 0x801202300 ---
>> db> call doadump
>> Physical memory: 3314 MB
>> Dumping 1272 MB: 1257 1241 1225 1209 1193 1177 1161 1145 1129 1113  
>> 1097 1081 1065 1049 1033 1017 1001 985 969 953 937 921 905 889 873  
>> 857 841 825 809 793 777 761 745 729 713 697 681 665 649 633 617 601  
>> 585 569 553 537 521 505 489 473 457 441 425 409 393 377 361 345 329  
>> 313 297 281 265 249 233 217 201 185 169 153 137 121 105 89 73 57 41  
>> 25 9
>> Dump complete
>> = 0
>>
>> I've got the crashdump saved, if there's any information in there  
>> that can be helpful.
>>
>> This is -current from a week ago on amd64.
>>
>> At the current rate, this happens every couple of days, so  
>> gathering more information on the live system probably won't be a  
>> problem.
>
> Different machine, identical configuration, I just got this panic on  
> reboot:
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address	= 0xa8
> fault code		= supervisor write data, page not present
> instruction pointer	= 0x8:0xffffffff802ee3b5
> stack pointer	        = 0x10:0xfffffffe40016980
> frame pointer	        = 0x10:0xfffffffe40016990
> code segment		= base 0x0, limit 0xfffff, type 0x1b
> 			= DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags	= interrupt enabled, resume, IOPL = 0
> current process		= 1 (init)
> [thread pid 1 tid 100002 ]
> Stopped at      _sx_xlock+0x15: lock cmpxchgq   %rsi,0x18(%rdi)
> db> bt
> Tracing pid 1 tid 100002 td 0xffffff000141fab0
> _sx_xlock() at _sx_xlock+0x15
> zfsctl_umount_snapshots() at zfsctl_umount_snapshots+0xa5
> zfs_umount() at zfs_umount+0xdd
> dounmount() at dounmount+0x2b4
> vfs_unmountall() at vfs_unmountall+0x42
> boot() at boot+0x655
> reboot() at reboot+0x42
> syscall() at syscall+0x1a5
> Xfast_syscall() at Xfast_syscall+0xab
> --- syscall (55, FreeBSD ELF64, reboot), rip = 0x40897c, rsp =  
> 0x7fffffffe7b8, rbp = 0x402420 ---
>
>
> -- 
> Stefan Bethke <stb_at_lassitu.de>   Fon +49 151 14070811
>
>
>
>
> _______________________________________________
> freebsd-fs_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe_at_freebsd.org"

-- 
Stefan Bethke <stb_at_lassitu.de>   Fon +49 151 14070811
Received on Wed Feb 18 2009 - 06:12:04 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:42 UTC