Alexander Motin wrote: > Ivan Voras wrote: >> If you have a drive to play with, could you also check UFS vs ZFS on >> both ATA & AHCI? To try and see if the IO scheduling of ZFS plays nicely. >> >> For benchmarks I suggest blogbench and bonnie++ (in ports) and if you >> want to bother, randomio, http://arctic.org/~dean/randomio . > gstat shown that most of time only one request at a time was running on > disk. Looks like read or read-modify-write operations (due to many short > writes in test pattern) are heavily serialized in UFS, even when several > processes working with the same file. It has almost eliminated effect of > NCQ in this test. > > Test 2: Same as before, but without O_DIRECT flag: > ata(4), 1 process, first tps: 78 > ata(4), 1 process, second tps: 469 > ata(4), 32 processes, first tps: 83 > ata(4), 32 processes, second tps: 475 > ahci(4), 1 process, first tps: 79 > ahci(4), 1 process, second tps: 476 > ahci(4), 32 processes, first tps: 93 > ahci(4), 32 processes, second tps: 488 Ok, so this is UFS, normal caching. > Data doesn't fit into cache. Multiple parallel requests give some effect > even with legacy driver, but with NCQ enabled it gives much more, almost > doubling performance! You've seen queueing in gstat for ZFS+NCQ? > Teste 4: Same as 3, but with kmem_size=1900M and arc_max=1700M. > ata(4), 1 process, first tps: 90 > ata(4), 1 process, second tps: ~160-300 > ata(4), 32 processes, first tps: 112 > ata(4), 32 processes, second tps: ~190-322 > ahci(4), 1 process, first tps: 90 > ahci(4), 1 process, second tps: ~140-300 > ahci(4), 32 processes, first tps: 180 > ahci(4), 32 processes, second tps: ~280-550 And this is ZFS with some tuning. I've also seen high deviation in performance on ZFS so it seems normal. > As conclusion: > - in this particular test ZFS scaled well with parallel requests, > effectively using multiple disks. NCQ shown great benefits. But i386 > constraints are significantly limited ZFS caching abilities. > - UFS behaves very poorly in this test. Even with parallel workload it > often serializes device accesses. May be results would be different if I wouldn't say UFS behaves poorly from your results. It looks like only the multiprocess case is bad on the UFS. For single-process access the difference in favour of ZFS is ~10 TPS on the first case and UFS is apparently much better in all cases but the last on the second try. This may be explained if you have a large variation between runs. Also, did you use the whole drive for the file system? In cases like this it would be interesting to create a special partition (in all cases, on all drives), covering only a small segment on the disk (thinking of the drive as a rotational media, made of cylinders). For example, a partition of size of 30 GB covering only the outer tracks. > there would be separate file for each process, or with some other > options, but I think pattern I have used is also possible in some > applications. Only benefit UFS shown here is more effective memory > management on i386, leading to higher cache effectiveness. > > It would be nice if somebody explained that UFS behavior. Possibly, read-only access to memory cache structures is protected by read-only locks, which are efficient, and ARC is more complicated than it's worth? But others should have better guesses :)Received on Thu Dec 03 2009 - 08:00:42 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:58 UTC