Damian Gerow <dgerow_at_afflictions.org> writes: > 1) Reverting the extended attribute locking change (r189967) does not change > the situation for me. I still experience checksum issues and data loss. > (Unsurprisingly.) > > 2) Without umass loaded, I have been completely unable to trigger the issue. > > 3) Once umass is loaded, and the symptoms start cropping up, unloading umass > does not make them go away (again, unsurprisingly). What I haven't yet > tested, but am currently working towards, is whether removing umass stops > further checksum errors from ocurring. > > 4) r189967 does remove some LORs for me, even though I don't use (that I > know of) extended attributes. > > 5) It seems that so long as umass is used at all, the symptoms will > eventually show up. I've been able to trigger the symptoms by inserting > then removing a umass device immediately after boot, then ramping up the > workload. > > 6) The only difference made by vfs.zfs.debug=1 is that zfs reclaims are > logged. > > I'm at a bit of a loss as to what to test next, other than checking for an > increased number of checksum errors after unloading umass. However, I'm not > convinced this is going to highlight the actual problem. I'm all ears as to > what to test for at this point, as I'm running out of ideas. I have a question or two, and an idea. The questions: 1) How much RAM do you have, is it 4G or more? (I'm guessing the answer is "yes".) 2) What does "sysctl -a | grep bounced" say? Check this both before and after loading umass and seeing the bug triggered. My idea: I suspect a bug in the bounce-buffer code that does I/O to memory space beyond the area a given piece of hardware can access directly thru DMA. I've had some similar issues with checksum errors, and they seem to have gone away since lowering hw.physmem to 3400M in loader.conf, which cuts memory usage down below the point where anything needs to use bounce buffers. You might try lowering hw.physmem and see if that helps; check with the "sysctl -a | grep bounced" command, you should be seeing something like hw.busdma.zone0.total_bounced: 0 hw.busdma.zone1.total_bounced: 0 hw.busdma.zone2.total_bounced: 0 if no bounce-buffer usage is going on. (The number of zones may be different on your system.)Received on Thu Apr 16 2009 - 16:46:29 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:46 UTC