Re: ZFS panic under extreme circumstances (2/3 disks corrupted)

From: Thomas Backman <serenity_at_exscape.org>
Date: Sun, 24 May 2009 21:33:53 +0200
On May 24, 2009, at 09:02 PM, Thomas Backman wrote:

> So, I was playing around with RAID-Z and self-healing, when I  
> decided to take it another step and corrupt the data on *two* disks  
> (well, files via ggate) and see what happened. I obviously expected  
> the pool to go offline, but I didn't expect a kernel panic to follow!
>
> What I did was something resembling:
> 1) create three 100MB files, ggatel create to create GEOM providers  
> from them
> 2) zpool create test raidz ggate{1..3}
> 3) create a 100MB file inside the pool, md5 the file
> 4) overwrite 10~20MB (IIRC) of disk2 with /dev/random, with dd if=/ 
> dev/random of=./disk2 bs=1000k count=20 skip=40, or so (I now know  
> that I wanted *seek*, not *skip*, but it still shouldn't panic!)
> 5) Check if the md5 of file: everything OK, zpool status shows a  
> degraded pool.
> 6) Repeat step #4, but with disk 3.
> 7) zpool scrub test
> 8) Panic!
> [...]
FWIW, I couldn't replicate this when using seek (i.e. corrupt the  
middle of the "disk" rather than the beginning):

[root_at_clone ~/zfscrash]# zpool status test
   pool: test
  state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
    see: http://www.sun.com/msg/ZFS-8000-8A
  scrub: scrub in progress for 0h0m, 7.72% done, 0h6m to go
config:

	NAME        STATE     READ WRITE CKSUM
	test        ONLINE       0     0    18
	  raidz1    ONLINE       0     0   161
	    ggate0  ONLINE       0     0     0  512 repaired ## note that I  
did *not* touch this "disk" at all, so why "512 repaired"?
	    ggate1  ONLINE       0     0   702  73K repaired
	    ggate2  ONLINE       0     0    62  64.5K repaired

errors: 9 data errors, use '-v' for a list

After overwriting the *beginning* of disk2 and disk3 as well, "zpool  
scrub" appears to hang. Two vdev failures on the console, and zpool  
status hangs as well. No panic this time around (I've waited 5 minutes  
and nothing appears to happen, but the computer is usable on other  
ttys). The failmode property was set to the default, i.e. wait, in  
both cases.

Regards,
Thomas
Received on Sun May 24 2009 - 17:34:00 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:48 UTC