Re: AHCI timeout when using ZFS + AIO + NCQ

From: Steven Hartland <killing_at_multiplay.co.uk>
Date: Thu, 24 Jan 2013 12:50:30 -0000
Is it always the same disk, of so replace it SMART helps identify issues
but doesn't tell you 100% there's no problem.
----- Original Message ----- 
From: "Vladislav Prodan" <universite_at_ukr.net>
To: <fs_at_freebsd.org>
Cc: <current_at_freebsd.org>
Sent: Thursday, January 24, 2013 12:19 PM
Subject: AHCI timeout when using ZFS + AIO + NCQ


>I have the server:
>
> FreeBSD 9.1-PRERELEASE #0: Wed Jul 25 01:40:56 EEST 2012
>
> Jan 24 12:53:01 vesuvius kernel: atapci0: <JMicron ATA controller> port 
> 0xc040-0xc047,0xc030-0xc033,0xc020-0xc027,0xc010-0xc013,0xc000-0xc00f mem 0xfe210000-0xfe2101ff irq 51 at device 0.0 on pci3
> ...
> Jan 24 12:53:01 vesuvius kernel: ahci0: <ATI IXP700 AHCI SATA controller> port 
> 0xf040-0xf047,0xf030-0xf033,0xf020-0xf027,0xf010-0xf013,0xf000-0xf00f mem 0xfe307000-0xfe3073ff irq 19 at device 17.0 on pci0
> Jan 24 12:53:01 vesuvius kernel: ahci0: AHCI v1.20 with 6 6Gbps ports, Port Multiplier supported
> ...
> Jan 24 12:53:01 vesuvius kernel: ada2 at ahcich2 bus 0 scbus4 target 0 lun 0
> Jan 24 12:53:01 vesuvius kernel: ada2: <ST3000DM001-9YN166 CC4C> ATA-8 SATA 3.x device
> Jan 24 12:53:01 vesuvius kernel: ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
> Jan 24 12:53:01 vesuvius kernel: ada2: Command Queueing enabled
> Jan 24 12:53:01 vesuvius kernel: ada2: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
> Jan 24 12:53:01 vesuvius kernel: ada2: Previously was known as ad12
> ...
> I use 4 HDD in RAID10 via ZFS.
>
> With a very irregular intervals fall off HDD drives. As a result, the server stops.
>
> Jan 24 06:48:06 vesuvius kernel: ahcich2: Timeout on slot 6 port 0
> Jan 24 06:48:06 vesuvius kernel: ahcich2: is 00000000 cs 00000000 ss 000000c0 rs 000000c0 tfd 40 serr 00000000 cmd 0000e817
> Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 4c 4e 1e 40 68 00 00 01 00 00
> Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): CAM status: Command timeout
> Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): Retrying command
> Jan 24 06:51:11 vesuvius kernel: ahcich2: AHCI reset: device not ready after 31000ms (tfd = 00000080)
> Jan 24 06:51:11 vesuvius kernel: ahcich2: Timeout on slot 8 port 0
> Jan 24 06:51:11 vesuvius kernel: ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd 00 serr 00000000 cmd 0000e817
> Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
> Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout
> Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked
> Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192
> Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192
> Jan 24 06:51:11 vesuvius kernel: ahcich2: AHCI reset: device not ready after 31000ms (tfd = 00000080)
> Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192
> Jan 24 06:51:11 vesuvius kernel: ahcich2: Timeout on slot 8 port 0
> Jan 24 06:51:11 vesuvius kernel: ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd 00 serr 00000000 cmd 0000e817
> Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
> Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout
> Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked
> Jan 24 06:51:11 vesuvius kernel: swap_pager: I/O error - pagein failed; blkno 4227133,size 8192, error 6
> Jan 24 06:51:11 vesuvius kernel: (ada2:(pass2:vm_fault: pager read error, pid 1943 (named)
> Jan 24 06:51:11 vesuvius kernel: ahcich2:0:ahcich2:0:0:0:0): lost device
> Jan 24 06:51:11 vesuvius kernel: 0): passdevgonecb: devfs entry is gone
> Jan 24 06:51:11 vesuvius kernel: pid 1943 (named), uid 53: exited on signal 11
> ...
>
> Helps only restart by pressing Power.
> Judging by the state of SMART, HDD have no problems. SATA data cable changed.
>
>
> I found a similar problem:
>
> http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html
> PR: amd64/165547: NVIDIA MCP67 AHCI SATA controller timeout
>
> -- 
> Vladislav V. Prodan
> System & Network Administrator
> http://support.od.ua
> +380 67 4584408, +380 99 4060508
> VVP88-RIPE
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
> 


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster_at_multiplay.co.uk.
Received on Thu Jan 24 2013 - 11:49:59 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:34 UTC