Re: Kernel builds, but crashes at boot (amd64, Revision: 234306)

From: Rainer Hurling <rhurlin_at_gwdg.de>
Date: Mon, 16 Apr 2012 19:58:57 +0200
On 16.04.2012 19:31 (UTC+1), Konstantin Belousov wrote:
> On Mon, Apr 16, 2012 at 06:15:32PM +0200, Rainer Hurling wrote:
>> On 16.04.2012 16:55 (UTC+1), Konstantin Belousov wrote:
>>> On Mon, Apr 16, 2012 at 07:35:23AM -0700, matt wrote:
>>>> On 04/16/12 01:57, O. Hartmann wrote:
>>>>> On 04/15/12 12:30, Conrad J. Sabatier wrote:
>>>>>> Today I'm suddenly unable to boot a newly built kernel without crashing
>>>>>> right near the end of the device probes, just before the system is
>>>>>> about to actually come up:
>>>>>>
>>>>>> Fatal trap 18: integer divide fault while in kernel mode
>>>>>>
>>>>>> Stopped at 0xffffffff803b2646 = g_label_ufs_taste_common+0x36
>>>>>> divl 0x50(%rcx),%eax
>>>>>>
>>>>>> Backtrace lists this chain of calls:
>>>>>> g_label_ufs_taste_common
>>>>>> g_label_taste
>>>>>> g_new_provider_event
>>>>>> g_run_events
>>>>>> g_event_procbody
>>>>>> fork_exit
>>>>>> fork_trampoline
>>>>>>
>>>>>> Whether built with clang or gcc, CUSTOM config or GENERIC, same results
>>>>>> on rebooting.  No idea why this suddenly started happening, haven't
>>>>>> changed anything at all in my setup.
>>>>> My recent kernel does the same on two "FreeBSD 10.0-CURRENT #1 r234309:
>>>>> Sun Apr 15 14:14:11 CEST 2012" boxes. Both boxes in common is they are
>>>>> attached to a Dell UltraSharp U2711 screen which does have a built-in
>>>>> USB/MMC hub. I realized that it was possible to log into my lab's box
>>>> >from remote when I'm not in the lab and that is usually coincidentally
>>>>> with a switched off screen.
>>>>> This morning I loged in from home, loged out and got to the office,
>>>>> switched on the screen - and reboot! I wasn't able to get the system
>>>>> running again, it always got stuck in a
>>>>>
>>>>> Fatal trap 18: integer divide fault while in kernel mode
>>>>>
>>>>> Unplugging the screen's USB hub makes the system booting again!
>>>>>
>>>>> Following is one of the last logged messages from the kernel, I don not
>>>>> know whether this is usefull looking for the problem.
>>>>>
>>>>> Regards,
>>>>> Oliver
>>>>>
>>>>> Apr 12 15:32:33 telesto kernel: hwpmc:
>>>>> SOFT/16/64/0x67<INT,USR,SYS,REA,WRI>   TSC/1/64/0x20<REA>
>>>>> IAP/4/48/0x3ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA,PRC>
>>>>> IAF/3/48/0x61<INT,REA,WRI>   UCP/8/48/0x3f8<EDG,THR,REA,WRI,INV,QUA,PRC>
>>>>> UCF/1/48/0x60<REA,WRI>
>>>>> Apr 12 15:32:33 telesto kernel: uhub1: 4 ports with 4 removable, self
>>>>> powered
>>>>> Apr 12 15:32:33 telesto kernel: uhub2: 4 ports with 4 removable, self
>>>>> powered
>>>>> Apr 12 15:32:33 telesto kernel: uhub3: 2 ports with 2 removable, self
>>>>> powered
>>>>> Apr 12 15:32:33 telesto kernel: uhub0: 2 ports with 2 removable, self
>>>>> powered
>>>>> Apr 12 15:32:33 telesto kernel: ugen3.2:<vendor 0x8087>   at usbus3
>>>>> Apr 12 15:32:33 telesto kernel: uhub4:<vendor 0x8087 product 0x0024,
>>>>> class 9/0, rev 2.00/0.00, addr 2>   on usbus3
>>>>> Apr 12 15:32:33 telesto kernel: ugen0.2:<vendor 0x8087>   at usbus0
>>>>> Apr 12 15:32:33 telesto kernel: uhub5:<vendor 0x8087 product 0x0024,
>>>>> class 9/0, rev 2.00/0.00, addr 2>   on usbus0
>>>>> Apr 12 15:32:33 telesto kernel: Root mount waiting for: usbus3 usbus0
>>>>> Apr 12 15:32:33 telesto kernel: uhub5: 6 ports with 6 removable, self
>>>>> powered
>>>>> Apr 12 15:32:33 telesto kernel: uhub4: 8 ports with 8 removable, self
>>>>> powered
>>>>> Apr 12 15:32:33 telesto kernel: ugen3.3:<Cherry GmbH>   at usbus3
>>>>> Apr 12 15:32:33 telesto kernel: ukbd0:<Cherry GmbH wired keyboard,
>>>>> class 0/0, rev 2.00/1.11, addr 3>   on usbus3
>>>>> Apr 12 15:32:33 telesto kernel: kbd2 at ukbd0
>>>>> Apr 12 15:32:33 telesto kernel: uhid0:<Cherry GmbH wired keyboard,
>>>>> class 0/0, rev 2.00/1.11, addr 3>   on usbus3
>>>>> Apr 12 15:32:33 telesto kernel: Root mount waiting for: usbus3
>>>>> Apr 12 15:32:33 telesto kernel: ugen3.4:<vendor 0x0424>   at usbus3
>>>>> Apr 12 15:32:33 telesto kernel: uhub6:<vendor 0x0424 product 0x2514,
>>>>> class 9/0, rev 2.00/0.00, addr 4>   on usbus3
>>>>> Apr 12 15:32:33 telesto kernel: Root mount waiting for: usbus3
>>>>> Apr 12 15:32:33 telesto kernel: uhub6: 3 ports with 2 removable, self
>>>>> powered
>>>>> Apr 12 15:32:33 telesto kernel: ugen3.5:<vendor 0x0424>   at usbus3
>>>>> Apr 12 15:32:33 telesto kernel: uhub7:<vendor 0x0424 product 0x2640,
>>>>> class 9/0, rev 2.00/0.00, addr 5>   on usbus3
>>>>> Apr 12 15:32:33 telesto kernel: Root mount waiting for: usbus3
>>>>> Apr 12 15:32:33 telesto kernel: uhub7: 3 ports with 2 removable, self
>>>>> powered
>>>>> Apr 12 15:32:33 telesto kernel: Root mount waiting for: usbus3
>>>>> Apr 12 15:32:33 telesto kernel: ugen3.6:<Generic>   at usbus3
>>>>> Apr 12 15:32:33 telesto kernel: umass0:<Generic Ultra Fast Media
>>>>> Reader, class 0/0, rev 2.00/1.91, addr 6>   on usbus3
>>>>> Apr 12 15:32:33 telesto kernel: Root mount waiting for: usbus3
>>>>> Apr 12 15:32:33 telesto kernel: (probe0:umass-sim0:0:0:0): TEST UNIT
>>>>> READY. CDB: 0 0 0 0 0 0
>>>>> Apr 12 15:32:33 telesto kernel: (probe0:umass-sim0:0:0:0): CAM status:
>>>>> SCSI Status Error
>>>>> Apr 12 15:32:33 telesto kernel: (probe0:umass-sim0:0:0:0): SCSI status:
>>>>> Check Condition
>>>>> Apr 12 15:32:33 telesto kernel: (probe0:umass-sim0:0:0:0): SCSI sense:
>>>>> NOT READY asc:3a,0 (Medium not present)
>>>>> Apr 12 15:32:33 telesto kernel: da0 at umass-sim0 bus 0 scbus14 target 0
>>>>> lun 0
>>>>> Apr 12 15:32:33 telesto kernel: da0:<Generic Ultra HS-SD/MMC 1.91>
>>>>> Removable Direct Access SCSI-0 device
>>>>> Apr 12 15:32:33 telesto kernel: da0: 40.000MB/s transfers
>>>>> Apr 12 15:32:33 telesto kernel: da0: Attempt to query device size
>>>>> failed: NOT READY, Medium not present
>>>>> Apr 12 15:32:33 telesto kernel: ugen3.7:<Logitech>   at usbus3
>>>>> Apr 12 15:32:33 telesto kernel: ums0:<Logitech USB Laser Mouse, class
>>>>> 0/0, rev 2.00/56.01, addr 7>   on usbus3
>>>>> Apr 12 15:32:33 telesto kernel: ums0: 8 buttons and [XYZT] coordinates
>>>>> ID=0
>>>>> Apr 12 15:32:33 telesto kernel: Trying to mount root from
>>>>> ufs:/dev/gpt/root [rw]...
>>>>> Apr 12 15:32:33 telesto kernel: nvidia0:<GeForce GTX 570>   on vgapci0
>>>>> Apr 12 15:32:33 telesto kernel: vgapci0: child nvidia0 requested
>>>>> pci_enable_io
>>>>> Apr 12 15:32:33 telesto kernel: vgapci0: child nvidia0 requested
>>>>> pci_enable_io
>>>>> Apr 12 15:32:33 telesto kernel: vboxdrv: fAsync=0 offMin=0x2d8
>>>>> offMax=0x603c
>>>>> Apr 12 15:32:33 telesto kernel: module_register: module ng_ether already
>>>>> exists!
>>>>> Apr 12 15:32:33 telesto kernel: Module ng_ether failed to register: 17
>>>>>
>>>> Disconnect "Generic Ultra HS-SD/MMC" device which is presenting
>>>> da0...same problem here. System will boot if da0 is either not present
>>>> or has media (I think). In my case it was a different card reader that
>>>> had no cards in it, which seem to be similar to your case.
>>>>
>>>> My guess is that this problem is related to recent changes in da, but I
>>>> couldn't pinpoint in the diff what's going wrong in a quick look.
>>>
>>> So did you tried to revert r234177 and/or r233963 ?
>>
>> I just updated my system to r234342, only downgraded
>> /usr/src/sys/cam/scsi/scsi_da.c to r233746, and now the system is
>> booting again. So obviously there is something wrong with the newest
>> patch to  scsi_da.c.
> It is too broad, try to revert exactly one patch and see whether it works.

Sorry for my bad english. I wanted to say, that I only reverted exactly 
one patch (file scsi_da.c from 234177 back to 233746 manually). The rest 
is up to r234342.
Received on Mon Apr 16 2012 - 15:59:02 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:26 UTC