Re: ZFS panic in zone_dataset_visible

From: Scott Burns <scott_at_bqinternet.com>
Date: Mon, 22 Sep 2008 13:12:50 -0400
Scott Burns wrote:
> Hello,
> 
> I am running several servers using Pawel's July 27 ZFS patchset, applied 
> against 8-current source from the same day.  I have seen a similar panic 
> on two different servers:
...
> Stopped at      _mtx_lock_flags+0x15:   lock cmpxchgq   %rsi,0x18(%rdi)
> db> bt
> Tracing pid 95276 tid 100432 td 0xffffff010b3cc000
> _mtx_lock_flags() at _mtx_lock_flags+0x15
> zone_dataset_visible() at zone_dataset_visible+0x94
> zfs_mount() at zfs_mount+0x3e5
...

With a bit of testing, I found that this panic is easily reproducible. 
Simply try to list the contents of a snapshot from within a jail, as 
long as the snapshot isn't already mounted, and the system panics.  If I 
mount the snapshot from outside of the jail first, and then list it 
inside the jail, it does not panic.

I spent a bit of time debugging this weekend.  Trying to list an 
unmounted snapshot triggers a zfs_mount() for the snapshot, which calls 
zone_dataset_visible() to determine if the snapshot should be visible in 
the current zone.  When it is run outside of a jail, it returns true 
early on because INGLOBALZONE(curproc) is true, otherwise it takes 
another code path.

The panic is happening after that check, at mtx_lock(&pr->cr_mtx), 
because (pr = curthread->td_ucred->cr_prison) is NULL.  Interestingly, 
it's not NULL if zone_dataset_visible() is triggered by a "zfs list" 
command, but it is NULL if zone_dataset_visible() is called from 
zfs_mount().

As a temporary workaround, I modified my copy of 
cddl/compat/opensolaris/kern/opensolaris_zone.c to have 
zone_dataset_visible() return true if it is being called for a snapshot. 
  I modified it as below:

-if (INGLOBALZONE(curproc))
+if (INGLOBALZONE(curproc) || strchr(dataset, '_at_'))

This is obviously not ideal, since it allows the manipulation of the 
snapshot from another jail if the caller knows that it exists.  Since I 
am the only one with root access to any of the jails, I am not concerned 
with that. "zfs list" continues to behave normally.

I will continue looking at this, but since my main goal of working 
around the panic has been taken care of, I am not sure how long my 
attention span will last.  If the cause of 
curthread->td_ucred->cr_prison being NULL under these conditions is 
obvious to anyone, please let me know.

--
Scott Burns
System Administrator
BQ Internet Corporation
Received on Mon Sep 22 2008 - 15:21:18 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:35 UTC