Re: [mfi] command timeouts

From: Bjoern A. Zeeb <bzeeb-lists_at_lists.zabbadoz.net>
Date: Mon, 19 Feb 2007 13:55:47 +0000 (UTC)
On Mon, 19 Feb 2007, Bjoern A. Zeeb wrote:

> Hi,
>
> I am testing mfi on a Dell 2950 with 6 PD, 2LD (1st LD=RAID1,
> 2nd LD=RAID5, 1HTSP).
> (The somewhat sucky) megacli "works".
>
> While most commands to gather information work fine, as do pulling out
> disks hard, setting a disk offline or running some other commands hangs
> 'something', which might be the controller?
>
> For example:
>
> foo# megacli -PDOffline -PhysDrv'[1:3]' -a0
>
> EnclId-1 SlotId-3 state changed to OffLine.
> foo# foo# ls -l
> <hangs forever>
>
> It's not only this process but all disk IO related processes.
>
>
> On the serial console I get:
>
> ...
> mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 732 SECONDS
> mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 732 SECONDS
> mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 732 SECONDS
> mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 732 SECONDS
> mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 732 SECONDS
> mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 732 SECONDS
> mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 732 SECONDS
> mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 732 SECONDS
> mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 732 SECONDS
> mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 732 SECONDS
> mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 684 SECONDS
> mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 679 SECONDS
> mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 44 SECONDS
> mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 763 SECONDS
> mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 763 SECONDS
> mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 763 SECONDS
> mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 763 SECONDS
> mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 763 SECONDS
> mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 763 SECONDS
> mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 763 SECONDS
> mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 763 SECONDS
> mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 763 SECONDS
> mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 763 SECONDS
> mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 763 SECONDS
> mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 715 SECONDS
> mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 710 SECONDS
> mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 75 SECONDS
> mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 793 SECONDS
> mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 794 SECONDS
> mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 794 SECONDS
> mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 794 SECONDS
> mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 794 SECONDS
> mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 794 SECONDS
> mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 794 SECONDS
> mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 794 SECONDS
> mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 794 SECONDS
> mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 794 SECONDS
> mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 794 SECONDS
> mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 746 SECONDS
> mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 741 SECONDS
> mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 106 SECONDS
> mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 824 SECONDS
> ...
>
>
> I can still break to ddb. Without disk I/O, the only
> possible thing I can really do is type reset.
>
> I'll build a debugging kernel so I can do show alllocks, etc
> but if someone with more experience with this driver/hw could
> contact me I can run further tests.

this time with the debugging kernel:

foo# megacli -PDOffline -PhysDrv'[1:3]' -a0

EnclId-1 SlotId-3 state changed to OffLine.
foo# 
foo# 
foo# 
foo#


I was able to hit <enter> multiple times after the "uh it still lives"
but then ...

command 0xffffffff80c40000 not in queue, flags = 0x20, bit = 0x80
panic: command not in queue
cpuid = 2
Uptime: 1m17s
Physical memory: 4084 MB
Dumping 199 MB: 184 168 152 136 120 104 88 72 56 40 24 8
Dump complete

telnet> send brk
KDB: enter: Line break on console
[thread pid 15 tid 100009 ]
Stopped at      kdb_enter+0x2f: nop
db> where
Tracing pid 15 tid 100009 td 0xffffff012f5c4000
kdb_enter() at kdb_enter+0x2f
siointr1() at siointr1+0x400
siointr() at siointr+0x2e
intr_execute_handlers() at intr_execute_handlers+0x124
Xapic_isr1() at Xapic_isr1+0x7f
--- interrupt, rip = 0xffffffff803c9787, rsp = 0xffffffffac06eb30, rbp = 0xffffffffac06eb60 ---
_mtx_lock_sleep() at _mtx_lock_sleep+0x137
_mtx_lock_flags() at _mtx_lock_flags+0xe1
mfi_timeout() at mfi_timeout+0x32
softclock() at softclock+0x1c8
ithread_loop() at ithread_loop+0xfe
fork_exit() at fork_exit+0xaa
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffac06ed40, rbp = 0 ---
db> show alllocks
Process 24 (irq78: mfi0) thread 0xffffff012f5c5000 (100020)
exclusive sleep mutex MFI I/O lock r = 0 (0xffffff012f5cc630) locked _at_ /u1/src/HEAD/sys/dev/mfi/mfi.c:775


After the reboot it does not seem that the command
was executed as the disk still seems to be online (at least
it was the last time).

-- 
Bjoern A. Zeeb				bzeeb at Zabbadoz dot NeT
Received on Mon Feb 19 2007 - 13:00:16 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:05 UTC