Re: tws(4) kernel panic on boot

From: Andreas Turriff <maillist_at_turriff.net>
Date: Wed, 22 May 2013 23:37:33 -0700
On 5/22/2013 11:23 PM, Konstantin Belousov wrote:
> On Wed, May 22, 2013 at 11:08:50AM -0700, Andreas Turriff wrote:
>> On 5/21/2013 9:25 AM, Andreas Turriff wrote:
>>> On 5/21/2013 5:33 AM, Konstantin Belousov wrote:
>>>> On Mon, May 20, 2013 at 07:09:41PM -0700, Andreas Turriff wrote:
>>>>> On migrating one of my servers to -current, I discovered that the tws
>>>>> driver panics on boot; I will follow up with a full backtrace once I
>>>>> have a chance to extract it. In the meantime, there is a PR about a
>>>>> very
>>>>> similar error in twa - 177020. Is it possible those are related, and
>>>>> the
>>>>> same sort of change needs to be made to tws?
>>>> It is possible that the regression was in r246713, but the code is
>>>> structured differently, and there were more tws(4) changes since then.
>>>> You need to provide data for somebody to start looking into the problem.
>>> I know. That's why I said, I'd follow up with more info once I can
>>> extract it.
>>>
>>> The system in question is a Dell PowerEdge 840 server, 8 GiByte RAM,
>>> with an Intel NIC driven by em(4) and a 3Ware 9750-4i RAID controller.
>>> There is no src.conf
>>>
>>> /etc/make.conf:
>>> CPUTYPE?=core2
>>>
>>> Error message:
>>>
>>> LSI 3ware device driver for SAS/SATA storage controllers, version:
>>> 10.80.00.005
>>> tws0: <LSI 3ware SAS/SATA Storage Controller> port 0xec00-0xecff mem
>>> 0xfe9fc000-0xfe9fffff,0xfe980000-0xfe9bffff irq 16 at device 0.0 o1
>>> tws0: Using MSIng APIC ID to 4
>>> panic: _bus_dmamap_load_ccb: Unsupported func code 0
>>> cpuid = 0Version 2.0> irqs 0-23 on motherboard
>>> KDB: enter: panic2.0> irqs 32-55 on motherboard
>>> [ thread pid 0 tid 100000 ]
>>> Stopped at      kdb_enter+0x3e: movq    $0,kdb_why
>>>
>>> Backtrace
>>>
>>> Tracing pid 0 tid 100000 td 0xffffffff81376610
>>> kdb_enter() at kdb_enter+0x3e/frame 0xffffffff8191a340
>>> panic() at panic+0x175/frame 0xffffffff8191a3c0
>>> _bus_dmamap_load_ccb() at _bus_dmamap_load_ccb+0x1c3/frame
>>> 0xffffffff8191a420
>>> bus_dmamap_load_ccb() at bus_dmamap_load_ccb+0x91/frame
>>> 0xffffffff8191a480
>>> tws_map_request() at tws_map_request+0x71/frame 0xffffffff8191a4c0
>>> tws_get_param() at tws_get_param+0xdd/frame 0xffffffff8191a520
>>> tws_display_ctlr_info() at tws_display_ctlr_info+0x38/frame
>>> 0xffffffff8191a590
>>> tws_init_ctlr() at tws_init_ctlr+0x6b/frame 0xffffffff8191a5b0
>>> tws_attach() at tws_attach+0xd79/frame 0xffffffff8191a670
>>> device_attach() at device_attach+0x396/frame 0xffffffff8191a6c0
>>> bus_generic_attach() at bus_generic_attach+0x2d/frame 0xffffffff8191a6e0
>>> acpi_pci_attach() at acpi_pci_attach+0x15f/frame 0xffffffff8191a730
>>> device_attach() at device_attach+0x396/frame 0xffffffff8191a780
>>> bus_generic_attach() at bus_generic_attach+0x2d/frame 0xffffffff8191a7a0
>>> acpi_pcib_attach() at acpi_pcib_attach+0x24d/frame 0xffffffff8191a7f0
>>> acpi_pcib_pci_attach() at acpi_pcib_pci_attach+0x9f/frame
>>> 0xffffffff8191a830
>>> device_attach() at device_attach+0x396/frame 0xffffffff8191a880
>>> bus_generic_attach() at bus_generic_attach+0x2d/frame 0xffffffff8191a8a0
>>> acpi_pci_attach() at acpi_pci_attach+0x15f/frame 0xffffffff8191a8f0
>>> device_attach() at device_attach+0x396/frame 0xffffffff8191a940
>>> bus_generic_attach() at bus_generic_attach+0x2d/frame 0xffffffff8191a960
>>> acpi_pcib_attach() at acpi_pcib_attach+0x24d/frame 0xffffffff8191a9b0
>>> acpi_pcib_acpi_attach() at acpi_pcib_acpi_attach+0x299/frame
>>> 0xffffffff8191aa00
>>> device_attach() at device_attach+0x396/frame 0xffffffff8191aa50
>>> bus_generic_attach() at bus_generic_attach+0x2d/frame 0xffffffff8191aa70
>>> acpi_attach() at acpi_attach+0xdd6/frame 0xffffffff8191ab30
>>> device_attach() at device_attach+0x396/frame 0xffffffff8191ab80
>>> bus_generic_attach() at bus_generic_attach+0x2d/frame 0xffffffff8191aba0
>>> nexus_acpi_attach() at nexus_acpi_attach+0x76/frame 0xffffffff8191abd0
>>> device_attach() at device_attach+0x396/frame 0xffffffff8191ac20
>>> bus_generic_new_pass() at bus_generic_new_pass+0xe9/frame
>>> 0xffffffff8191ac50
>>> bus_set_pass() at bus_set_pass+0x8f/frame 0xffffffff8191ac80
>>> configure() at configure+0xa/frame 0xffffffff8191ac90
>>> mi_startup() at mi_startup+0x118/frame 0xffffffff8191acb0
>>> btext() at btext+0x2c
>>>
>>>
>>>
>>> _______________________________________________
>>> freebsd-current_at_freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>>> To unsubscribe, send any mail to
>>> "freebsd-current-unsubscribe_at_freebsd.org"
>> And after taking a very close look at the source code for tws, I spotted
>> the problem. Patch included.
>>
>> ~Andreas
>>
>> Index: sys/dev/tws/tws.h
>> ===================================================================
>> --- sys/dev/tws/tws.h   (revision 250856)
>> +++ sys/dev/tws/tws.h   (working copy)
>> _at__at_ -137,7 +137,7 _at__at_
>>        TWS_DIR_IN = 0x2,
>>        TWS_DIR_OUT = 0x4,
>>        TWS_DIR_NONE = 0x8,
>> -    TWS_DATA_CCB = 0x16,
>> +    TWS_DATA_CCB = 0x10,
>>    };
>>
>>    enum tws_intrs {
>>
> Do you mean that this change alone fixes your panic and the controller
> works after the boot ?
>
> I started looking at the code, and thought that there some issues
> with DATA_CCB flag set too eagerly.
I've been running that kernel all day, rebuilding userland (ports) on a 
4-drive ZFS RAID-Z on that controller, and not seen a single crash, 
slowdown, hiccup or untoward log message.

~Andreas
Received on Thu May 23 2013 - 04:37:40 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:37 UTC