Am 19.12.2011 16:42, schrieb Peter Maloney: > On 12/19/2011 03:22 PM, Stefan Esser wrote: >> So: Can anybody reproduce this distribution requests? > I don't have a raidz1 machine, and no time to make you a special raidz1 > pool out of spare disks, but on my raidz2 I can only ever see unevenness > when a disk is bad, or between different vdevs. But you only have one vdev. Thanks for replying. In my previous raidz1 pool consisting of 3*1TB, one of the drives had to be replaced because it showed lots of recoverable errors when I initially created the pool. The effects where much more drastic than what I see now: Given identical request rates, the failed drive was 100% busy when the other drives had busy percentages in the one digit range. But the observed differences seem to be caused by a different rate of read requests issued towards the drives (the first two receive 30% of the reads, each, while the last two receive 20% each). And this ratio has been stable over months (I had already noticed this in summer, but did not have time to start a thread at that time). > Check is that your disks are identical (are they? we can only assume so > since you didn't say so). Yes, all 4 are identical. > Show us output from: > smartctl -i /dev/ada0 Model Family: SAMSUNG SpinPoint F4 EG (AFT) Device Model: SAMSUNG HD204UI Serial Number: S2H7JD1B116957 LU WWN Device Id: 5 0024e9 0049bee63 Firmware Version: 1AQ10001 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 6 Local Time is: Mon Dec 19 19:23:36 2011 CET ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 0 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 067 067 025 Pre-fail Always - 10127 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 254 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 2300 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 1 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 228 181 Program_Fail_Cnt_Total 0x0022 100 100 000 Old_age Always - 621067 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 4 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 064 055 000 Old_age Always - 28 (Min/Max 15/48) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 2 223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 1 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 264 > smartctl -i /dev/ada1 Model Family: SAMSUNG SpinPoint F4 EG (AFT) Device Model: SAMSUNG HD204UI Serial Number: S2H7JD1B116947 LU WWN Device Id: 5 0024e9 0049bee49 Firmware Version: 1AQ10001 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 6 Local Time is: Mon Dec 19 19:23:22 2011 CET ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 0 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 067 067 025 Pre-fail Always - 10096 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 255 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 2316 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 1 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 231 181 Program_Fail_Cnt_Total 0x0022 100 100 000 Old_age Always - 2175909 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 1 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 064 055 000 Old_age Always - 26 (Min/Max 16/47) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 1 223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 1 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 264 > smartctl -i /dev/ada2 Model Family: SAMSUNG SpinPoint F4 EG (AFT) Device Model: SAMSUNG HD204UI Serial Number: S2H7JD1B116956 LU WWN Device Id: 5 0024e9 0049bee60 Firmware Version: 1AQ10001 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 6 Local Time is: Mon Dec 19 19:24:24 2011 CET 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 0 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 067 066 025 Pre-fail Always - 10254 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 246 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 2300 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 227 181 Program_Fail_Cnt_Total 0x0022 100 100 000 Old_age Always - 105259 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 1 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 064 056 000 Old_age Always - 28 (Min/Max 16/45) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 0 223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 256 > smartctl -i /dev/ada3 Model Family: SAMSUNG SpinPoint F4 EG (AFT) Device Model: SAMSUNG HD204UI Serial Number: S2H7JD1B116946 LU WWN Device Id: 5 0024e9 0049bee47 Firmware Version: 1AQ10001 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 6 Local Time is: Mon Dec 19 19:24:55 2011 CET 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 0 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 066 066 025 Pre-fail Always - 10472 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 250 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 2302 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 227 181 Program_Fail_Cnt_Total 0x0022 100 100 000 Old_age Always - 239254 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 1 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 064 055 000 Old_age Always - 27 (Min/Max 16/47) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 2 223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 259 > Since your tests show read ms/r to be pretty even, I guess your disks > are not broken. But the ms/w is slightly different. So I think it seems > that the first 2 disks are slower for writing (someone once said that My interpretation is, that the first two have higher write latencies since they receive more read requests. > refurbished disks are like this, even if identical), or the hard disk > controller ports they use are slower. For example, maybe your > motherboard has 6 ports, and you plugged disks 1,2,3 into port 1,2,3 and > disk 4 into port 5. Disk 3 and 4 would have their own channel, but disk > 1 and 2 share one. This is an ICH10 and the drives are connected to SATA II channels (the SATA III channels are reserved for a planned SSD cache). > So if the disks are identical, I would guess your hard disk controller > is to blame. To test this, first back it up. Then *fix your setup by > using labels*. ie. use gpt/somelabel0 or gptid/....... rather than > ada0p2. Check "ls /dev/gpt*" output for options on what labels you have > already. Then try swapping disks around to see if the load changes. Make > sure to back up... The drives are lalready abelled and I can easily modify the pool to refer to GPT labels. But swapping drives should not cause any harm in ZFS, whether labels are device names are used (the drives in the pool are identified by their GUID). > Swapping disks (or even removing one depending on controller, etc. when > it fails) without labels can be bad. Yes, I know (having seen my first Unix system more than 30 years ago). I'll re-import the drives with "zpool import -d /dev/gpt ..." but need to boot from an alternate boot device first. > eg. > You have ada1 ada2 ada3 ada4. > Someone spills coffee on ada2; it fries and cannot be detected anymore, > and you reboot. > Now you have ada1 ada2 ada3. > Then things are usually still fine (even though ada3 is now ada2 and > ada4 is now ada3, because there is some zfs superblock stuff to keep > track of things), but if you also had an ada5 that was not part of the > pool, or was a spare or a log or something other than another disk in > the same vdev as ada1, etc., bad things happen when it becomes ada4. > Unfortunately, I don't know exactly what people do to cause the "bad > things" that happen. When this happened to me, it just said my pool was > faulted or degraded or something, and set a disk or two to UNAVAIL or > FAULTED. I don't remember it automatically resilvering them, but when I > read about these problems, I think it seems like some disks were > resilvered afterwards. The recovery from partial pool failures and the collection of drives to form a pool has been modified several times in the last two years and should be quite robust by now. One thing to look out for is to not copy a pool to new disk drives (I used to have 3*1TB, copied to 4*2TB) and later connect a drive from the original pool with its ZFS metadata intact at the end of the drive (I had cleared the first 1MB, but not the last 1MB). This causes confusion, if the name of the pool has not changed. But other than that, I do not see much risk in ZFS pools built from /dev nodes. > And last thing I can think of is to make sure your partitions are > aligned, and identical. Show us output from: > gpart show They have all been created by a script that takes the device node name as parameter and thus are identical. => 34 3907029101 ada0 GPT (1.8T) 34 30 - free - (15k) 64 192 1 freebsd-boot (96k) 256 3565158400 2 freebsd-zfs (1.7T) 3565158656 341870479 3 freebsd (163G) => 34 3907029101 ada1 GPT (1.8T) 34 30 - free - (15k) 64 192 1 freebsd-boot (96k) 256 3565158400 2 freebsd-zfs (1.7T) 3565158656 341870479 3 freebsd (163G) => 34 3907029101 ada2 GPT (1.8T) 34 30 - free - (15k) 64 192 1 freebsd-boot (96k) 256 3565158400 2 freebsd-zfs (1.7T) 3565158656 341870479 3 freebsd (163G) => 34 3907029101 ada3 GPT (1.8T) 34 30 - free - (15k) 64 192 1 freebsd-boot (96k) 256 3565158400 2 freebsd-zfs (1.7T) 3565158656 1792 - free - (896k) 3565160448 341868544 3 freebsd-swap (163G) 3907028992 143 - free - (71k) There is an unused 10% at the end of each device, and I have recently made ada3p3 a swap device, just to be able to collect kernel dumps (no swpa is actually used; this is an 8GB RAM machine with 6GB assigned to ARC and mostly low load). Best regards, STefanReceived on Mon Dec 19 2011 - 18:09:03 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC