Re: ZFS panic space_map.c line 110

From: Richard Todd <rmtodd_at_ichotolot.servalan.com> Date: Thu, 07 May 2009 22:06:21 -0500 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:47 UTC

Martin <nakal_at_web.de> writes:
> Hi,
>
> I have a file server running ZFS on -CURRENT. Someone has tried to
> transfer a file with several gigabytes onto the system. The kernel
> crashed with a panic and freezed up during spewing the panic. I've only
> written down the most important messages:
>
> solaris assert ss==NULL
> zfs/space_map.c line 110
>
> process: 160 spa_zio
>
> I've heard that I can try to move the zpool cache away and import the
> zpool with force once again. Will this help? 

I kinda doubt it. 

> zpool with force once again. Will this help? I am asking because I
> don't know if the panic is caused by a corrupt cache or corrupt
> file system metadata. Maybe someone can explain it. (I had to switch the

This panic wouldn't have anything to do with zpool.cache (that's just a file
to help the system find which devices it should expect to find zpools on 
during boot).   This is a problem with the free space map, which is part
of the filesystem metadata.  If you're lucky, it's just the in-core copy
of the free space map that was bogus and there's a valid map on disk.  If 
you're unlucky, the map on disk is trashed, and there's no really easy way
to recover that pool. 

> Is this issue with inconsistent zpools well known? I've seen some posts
> from 2007 and January 2009 that reported similar problems. Apparently
> some people have lost their entire zpools multiple times already, as
> far as I understood it.

Mine was probably one of those messages; I managed to get an error like that
once, through Seriously Provoking the system (repeatedly unmounting and 
mounting the main filesystem on one pool) while attempting to debug a 
different, unrelated problem.  It's not something I've ever seen
in any sort of "normal" usage, and just copying a few gig to the FS shouldn't
cause this sort of problem.  I managed to recover the data without having to
resort to backups, by hacking the kernel to disable some of the asserts 
in space_map.c, iterating until I reached a point where I got a kernel that
could import the pool without panicing.  Once I did that I managed to mount
the fs readonly and copy everything off to a different device.    Like I said,
not an *easy* way to recover that data.  

> One more piece of information I can give is that every hour the ZFS file
> systems create snapshots. Maybe it triggered some inconsistency between
> the writes to a file system and the snapshot, I cannot tell, because I
> don't understand the condition.

I doubt this had anything to do with the problem.