Initially, I noticed a problem where reading a file on this machine seemed to stop--something like a video would just stop playing. At first, I thought it was the machine, but a new motherboard, CPU, and RAM later, the problem persists. The network card uses a different chipset, too. The files are on zfs, but scrubs are fine, and zpool status lists no errors of any kind. Trying to reproduce the problem, I set up a script that reading a random 1M block every 60 seconds off the drive backing zfs. That's when I noticed something: one disk seems to be causing the problems. I logged the dd times, and some of them were huge--more than a minute. The times on the other disk in the mirrored vdev were low. I've only seen the problem when I have a vm's disk image hosted on the machine. That said, the network interface is configured at 100mbps, so there's no reason for that to saturate the disk's throughput. Top reports that almost 20% of the CPU is going towards interrupts. I can read a file off the zfs pool at over 50MB/s, so that shouldn't be a problem. One thing I'm wondering is why the disk read doesn't timeout quickly? At least that way zfs could try to use the other drive in the mirrored vdev. Any ideas? One thing I should try is switching the drive, see if the problem follows the disk or stays with the lowest /dev/adX device. I'm using geli, but the read problems happen with both /dev/adX AND /dev/adX.eli., so I don't think that's it. I've seen the problem with Samba, NFS, and dd. Thanks in advance.Received on Tue Apr 20 2010 - 04:57:24 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:02 UTC