On Thu, Apr 04, 2013 at 02:19:16AM +0200, Matthias Andree wrote: > Am 04.04.2013 01:38, schrieb Jeremy Chadwick: > > ... > > > While skimming Linux libata code and commits in the past, the only > > glaringly obvious bug/issue I see is with SB600/SB700 chipsets (the > > hardware revision apparently matters) and port multiplier (PMP) support > > and soft resets. > > > > Are you using a port multiplier? I doubt it, but I have to ask. > > I am not using a PMP as far as I know (unless one is buried on my Asus > M4A78T-E main board). It would seem the drives are directly attached to > the south bridge's SATA ports. Then the answer is nope, you're not using a PM. Details: http://www.serialata.org/technology/port_multipliers.asp http://en.wikipedia.org/wiki/Port_multiplier > >> Why only my Samsung HDD drive triggers this but not the WD drive, I do > >> not know yet. > > > > Please provide "gpart show -p ada1" output, both here and in the PR, > > if you could. > > => 63 1953525105 ada1 MBR (931G) > 63 209714337 ada1s1 freebsd [active] (100G) > 209714400 800 - free - (400k) > 209715200 71680000 ada1s2 ntfs (34G) > 281395200 15405 - free - (7.5M) > 281410605 488263545 ada1s3 linux-data (232G) > 769674150 1183851018 - free - (564G) This is what I was worried about. Referring to your "camcontrol identify" output: > device model SAMSUNG HD103SI > sector size logical 512, physical 512, offset 0 Hear me out entirely on this one. My theory is that your hard disk actually uses 4096-byte sectors but is too old to provide ATA IDENTIFY semantics to delineate between logical vs. physical sector size. In other words, only logical is provided, thus logical=physical in the eyes of all software; smartctl will show you the exact same thing too. There are drives like this in the wild, both SSDs as well as MHDDs. For example, the Intel 320-series SSD behaves this way too (providing only logical size). Do not let the capacity/size of the drive be the deciding factor; your drive is 1TB, but I also have many 1TB MHDDs that use 4096-byte sectors. Seagate/Samsung's specification** for the HD103SI states, and I quote: "Byte per Sensor: 512 bytes". Yes, it says "Sensor". Whether or not this documentation is correct/accurate is unknown, and when vendors have typos in their own specification docs, I cannot help but to honour the possibility of the information being wrong. So I'm unsure if this drive uses 512-byte sectors or 4096-byte sectors. That said: in your "gpart show ada1" output, none of your partitions (FreeBSD, NTFS, nor Linux) appear to be aligned to 4096-byte boundaries. Ideally you'd want to have these aligned to 1MB or 2MByte boundaries in the case you ever move to an SSD. You're also using the MBR scheme, which does not tend to play well with alignment. Comparatively, your WD5002ABYS drive **does** use 512-byte sectors (I know this for a fact). The problem here is that I cannot guarantee you that alignment is the problem. The performance impact of writes to partitions which are non-aligned is quite high, and NCQ just exacerbates this problem. I would love to tell you "switch to GPT and follow Warren Block's document***" but if your NTFS partition is Windows and is a Windows version older than Windows 7 GPT is not supported. One piece of evidence that refutes my theory is that if Windows and/or Linux partition are something you boot into and use often, I would imagine NCQ would be used in both of those environments and would suffer from the same issue. Although Windows tends to hide all sorts of transient errors from the user (sigh), Linux tends to be like FreeBSD with regards to such issues (on the console anyway; you wouldn't see such messages normally inside of X). If you have the time and want to put forth the effort, I would recommend backing up all your data on ada1, zero the first and last 1MByte of the drive, and then try following Warren Block's guide. I'd just recommend doing this: gpart create -s gpt ada1 gpart add -t freebsd-ufs -b 2m ada1 newfs -U -j /dev/ada1p1 (or remove -j if you don't want to use SUJ) I picked an alignment value of 2MBytes since it's both 4K-aligned and is generally safe for things like newer SSDs that have larger NAND erase block size (I am not going to get into a discussion about that here, so please stay focused. :-) ) If the problem is gone after that (it should be easy to induce by writing tons and tons of data to the drive), then we can safely say that the drive uses 4096-byte sectors and need to add it to the quirks list in ata_da.c. If the problem remains after that, then further investigation is needed, and we can safely rule out alignment. Welcome to all the pain/effort one has to go through when troubleshooting things like this. :-) Another thing: in your PR you state: > - I am running with kern.cam.ada.default_timeout=5 which makes the > computer recover faster I can definitely imagine cases where a drive using NCQ but doing writes to a non-aligned partition could take longer than 5 seconds to respond to an ATA CDB (this is different than a SATA or AHCI layer timeout). I am not telling you "change this back to 30", but it might not be helping your situation at all given my above theory. Finally: could you please provide output from "smartctl -x /dev/ada1"? I would like to rule out any possibility of your drive having some other kind of issue that might cause it to go catatonic. Thanks. ** -- http://www.seagate.com/files/www-content/support-content/documentation/samsung/tech-specs/eco_greenf2.pdf *** -- http://www.wonkity.com/~wblock/docs/html/ssd.html -- | Jeremy Chadwick jdc_at_koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |Received on Wed Apr 03 2013 - 23:05:27 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:36 UTC