Re: [mfi] command timeouts

From: Scott Long <scottl_at_pooker.samsco.org>
Date: Mon, 19 Feb 2007 16:31:11 -0700
Bjoern A. Zeeb wrote:
> On Mon, 19 Feb 2007, Bjoern A. Zeeb wrote:
> 
>> Hi,
>>
>> I am testing mfi on a Dell 2950 with 6 PD, 2LD (1st LD=RAID1,
>> 2nd LD=RAID5, 1HTSP).
>> (The somewhat sucky) megacli "works".
>>
>> While most commands to gather information work fine, as do pulling out
>> disks hard, setting a disk offline or running some other commands hangs
>> 'something', which might be the controller?
>>
>> For example:
>>
>> foo# megacli -PDOffline -PhysDrv'[1:3]' -a0
>>
>> EnclId-1 SlotId-3 state changed to OffLine.
>> foo# foo# ls -l
>> <hangs forever>
>>
>> It's not only this process but all disk IO related processes.
>>
>>
>> On the serial console I get:
>>
>> ...
>> mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 684 SECONDS
>> mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 679 SECONDS
>> mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 44 SECONDS
>> mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 715 SECONDS
>> mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 710 SECONDS
>> mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 75 SECONDS
>> mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 793 SECONDS
>> mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 746 SECONDS
>> mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 741 SECONDS
>> mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 106 SECONDS
>> mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 824 SECONDS
>> ...
>>
>>
>> I can still break to ddb. Without disk I/O, the only
>> possible thing I can really do is type reset.
>>
>> I'll build a debugging kernel so I can do show alllocks, etc
>> but if someone with more experience with this driver/hw could
>> contact me I can run further tests.
> 
> 
> this time with the debugging kernel:
> 
> foo# megacli -PDOffline -PhysDrv'[1:3]' -a0
> 
> EnclId-1 SlotId-3 state changed to OffLine.
> foo# foo# foo# foo#
> 
> 
> I was able to hit <enter> multiple times after the "uh it still lives"
> but then ...
> 
> command 0xffffffff80c40000 not in queue, flags = 0x20, bit = 0x80
> panic: command not in queue
> cpuid = 2
> Uptime: 1m17s
> Physical memory: 4084 MB
> Dumping 199 MB: 184 168 152 136 120 104 88 72 56 40 24 8
> Dump complete
> 
> telnet> send brk
> KDB: enter: Line break on console
> [thread pid 15 tid 100009 ]
> Stopped at      kdb_enter+0x2f: nop
> db> where
> Tracing pid 15 tid 100009 td 0xffffff012f5c4000
> kdb_enter() at kdb_enter+0x2f
> siointr1() at siointr1+0x400
> siointr() at siointr+0x2e
> intr_execute_handlers() at intr_execute_handlers+0x124
> Xapic_isr1() at Xapic_isr1+0x7f
> --- interrupt, rip = 0xffffffff803c9787, rsp = 0xffffffffac06eb30, rbp = 
> 0xffffffffac06eb60 ---
> _mtx_lock_sleep() at _mtx_lock_sleep+0x137
> _mtx_lock_flags() at _mtx_lock_flags+0xe1
> mfi_timeout() at mfi_timeout+0x32
> softclock() at softclock+0x1c8
> ithread_loop() at ithread_loop+0xfe
> fork_exit() at fork_exit+0xaa
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xffffffffac06ed40, rbp = 0 ---
> db> show alllocks
> Process 24 (irq78: mfi0) thread 0xffffff012f5c5000 (100020)
> exclusive sleep mutex MFI I/O lock r = 0 (0xffffff012f5cc630) locked _at_ 
> /u1/src/HEAD/sys/dev/mfi/mfi.c:775
> 
> 
> After the reboot it does not seem that the command
> was executed as the disk still seems to be online (at least
> it was the last time).
> 

megacli is known to be fragile.  Don't Do That (tm).  As for the panic,
It's probably a side effect of megacli putting the card and the driver 
into a chaotic state.

Scott
Received on Mon Feb 19 2007 - 22:47:11 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:05 UTC