On Mon, 16 Mar 2009, kevin wrote: > My laptop is T61. RAM is also tested by memtest86+ and return no error. Same here. Memtest fine. > "zfs send tank/usr/home/kevin_at_2009-03-15-16:51:21|zfs receive backup/kevin" > hangs system and i have to power off the machine.when the system up,i find > file error in snapshot tank/usr/home/kevin_at_2009-03-15-16:51:21.when i destroy > tank/usr/home/kevin_at_2009-03-15-16:51:21,then reboot system, i find more > errors. I've moved a box that was running that has been running FreeBSD 7 with a 7x1TB drive RAIDZ2 array. I've created the same RAIDZ2 with 8-CURRENT and am restoring data from tape to the new array (I wanted to rejig the zfs setup). All will appear well for a while i.e. no CRC errors, can scrub and rescrub the data whilst the data is restoring without problem. I restored the entire 3.5TB from tape without error. All data still scrubs fine. Then suddenly I get CRC errors on every disk. Repeated scrubs show up different amounts of errors. I just couldn't stop them. So I've started again, this time checking everything and moving drives onto different controllers to isolate problems. I have a gigabyte GA-P35-DS4 MB which has 8xSATA; 6xICH9R & 2xJMB363. It also has an Sil3132 in there which in previous incarnations had the odd drive on it. There's been mention of Sil problems & even though the ICH9, JMB363 and Sil3132 had been perfect with 7, I moved drives off it: 1. Rebuilt kernel and world from last night; Thu Mar 19 18:27:18 GMT 2009. 2. 6x1B drives on ICH9R 2. 2x500GB on JMB363, striped into 1TB 3. / is ufs on USB KEY 4. created RAIDZ2 again 5. recreated zfs filesystems 6. started restore from tape. Same again. I can restore data and perform a scrub after each tape (LTO2 ~200GB each) is restored. No errors. Get up to ~350GB, still no errors. Then the last scrub I've done throws up: ----- pool: pool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 0h51m with 0 errors on Fri Mar 20 10:57:18 2009 config: NAME STATE READ WRITE CKSUM pool ONLINE 0 0 0 raidz2 ONLINE 0 0 23 stripe/str0 ONLINE 0 0 489 12.3M repaired ad14 ONLINE 0 0 786 19.7M repaired ad16 ONLINE 0 0 804 20.1M repaired ad18 ONLINE 0 0 754 18.8M repaired ad20 ONLINE 0 0 771 19.3M repaired ad22 ONLINE 0 0 808 20.2M repaired ad24 ONLINE 0 0 848 21.2M repaired errors: No known data errors ----- So it happens on both controllers, on plain drives and the stripe. There just seems no way to get rid of these errors once they appear. As I said, last time I got the whole 3.5TB restored without error, was using it for a few days without error, constantly scrubbing to check reliability, then once the errors appear there's no way to remove them. As this same hardware worked, well with 7 for a long time, and can work perfectly with 8 for several days until the errors strike, this seems like some curious 8 problem? Any help would be appreciated. I'll be happy to provide any further info to help debug this. I didn't want to unnecessarily make this any longer than it already is. Cheers. -- Mark Powell - UNIX System Administrator - The University of Salford Information & Learning Services, Clifford Whitworth Building, Salford University, Manchester, M5 4WT, UK. Tel: +44 161 295 6843 Fax: +44 161 295 5888 www.pgp.com for PGP keyReceived on Fri Mar 20 2009 - 10:01:19 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:44 UTC