Re: After install - Fatal trap 18 ATA problem?

From: Yar Tikhiy <yar_at_comp.chem.msu.su>
Date: Tue, 20 Jun 2006 02:51:48 +0400
On Mon, Jun 19, 2006 at 05:05:09PM -0400, Anish Mistry wrote:
> On Monday 19 June 2006 16:36, Yar Tikhiy wrote:
> > On Mon, Jun 19, 2006 at 03:25:19PM -0400, Anish Mistry wrote:
> > > On Monday 19 June 2006 14:09, Yar Tikhiy wrote:
> > > > On Fri, Jun 16, 2006 at 01:32:55PM -0400, Anish Mistry wrote:
> > > > > I'm trying to get FreeBSD installed on one of my systems and
> > > > > I'm getting the error stated below.  I did have FreeBSD
> > > > > 6-STABLE installed a few months ago on this very system.  The
> > > > > only change is that FreeBSD is now installed on the second
> > > > > harddrive instead of the first.  This is using the -CURRENT
> > > > > snapshot for this month.  The install goes just fine.  I also
> > > > > get a very similar error when I install 6.1 too.
> > > > >
> > > > > This seems to be the same problem as:
> > > > > http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2006-0
> > > > >3/ms g00539.html
> > > > >
> > > > > But I don't have a built-in compact flash reader attached
> > > > > via. ATA.
> > > > >
> > > > > Full verbose boot+backtrace:
> > > > > http://am-productions.biz/docs/boot-panic-script.txt.gz
> > > > >
> > > > > rr232x: no controller detected.
> > > > > ata0-slave: pio=PIO4 wdma=WDMA2 udma=UDMA100 cable=80 wire
> > > > > ata0-master: pio=PIO4 wdma=WDMA2 udma=UDMA66 cable=80 wire
> > > > > ad0: setting PIO4 on nForce2 Pro chip
> > > > > ad0: setting UDMA66 on nForce2 Pro chip
> > > > > ad0: 17206MB <IBM DJNA-371800 J78OA30K> at ata0-master UDMA66
> > > > >
> > > > >
> > > > > Fatal trap 18: integer divide fault while in kernel mode
> > > > > cpuid = 0; apic id = 00
> > > > > instruction pointer	= 0x20:0xc089b49f
> > > > > stack pointer	        = 0x28:0xc0c20b64
> > > > > frame pointer	        = 0x28:0xc0c20bec
> > > > > code segment		= base 0x0, limit 0xfffff, type 0x1b
> > > > > 			= DPL 0, pres 1, def32 1, gran 1
> > > > > processor eflags	= interrupt enabled, resume, IOPL = 0
> > > > > current process		= 0 (swapper)
> > > > > [thread pid 0 tid 0 ]
> > > > > Stopped at      __qdivrem+0x3b: divl    %ecx,%eax
> > > > > db> bt
> > > > > Tracing pid 0 tid 0 td 0xc0a02fb8
> > > > > __qdivrem(219b700,0,0,0,0) at __qdivrem+0x3b
> > > > > __udivdi3(219b700,0,0,0) at __udivdi3+0x16
> > > >
> > > >                       ^^^
> > > > Looks like an attempt to divide something (0x219b700) by zero
> > > > using quad_t arithmetics.
> > > >
> > > > > ad_describe(c26e8580,c26e8580,c262c280,c265e400,c25ec200) at
> > > > > ad_describe+0x1b3
> > > > > ad_attach(c26e8580) at ad_attach+0x1e7
> > > > > device_attach(c26e8580,c0957850,c26e8580,c265e000,c265e400)
> > > > > at device_attach+0x58
> > > > > device_probe_and_attach(c26e8580) at
> > > > > device_probe_and_attach+0xe0
> > > > > bus_generic_attach(c25d2a80,c25d2a80,1,0,c26e8580) at
> > > > > bus_generic_attach+0x16
> > > > > ata_identify(c25d2a80) at ata_identify+0x1c8
> > > > > ata_boot_attach(0) at ata_boot_attach+0x3e
> > > > > run_interrupt_driven_config_hooks(0,c1ec00,c1e000,0,c0450af5)
> > > > > at run_interrupt_driven_config_hooks+0x18
> > > > > mi_startup() at mi_startup+0x96
> > > > > begin() at begin+0x2c
> > > > > db> ps
> > > > > --
> > > > > Anish Mistry
> > > >
> > > > FWIW, I saw an integer divide fault apparently related to the
> > > > ata driver when I tried to test a low-end VIA-based mobo with
> > > > FreeBSD. I gave it away soon and had had no time for debugging
> > > > though.
> > > >
> > > > Could you see using gdb what C code is at ad_describe+0x1b3
> > > > in your kernel?
> > >
> > > How do I do this without creating a kernel dump?  Do I need to
> > > setup remote GDB over a serial console?
> >
> > No, you don't.  It's much easier than that.  You were installing
> > FreeBSD from a CURRENT snapshot when the panic happened, weren't
> > you?  If so, get a working machine with not-too-old GDB first.
> > FreeBSD 5.x or 6.x will do.  Then locate kernel.debug or
> > kernel.symbols in the boot/kernel subdir on the installation CD. 
> > It's the kernel that panic'ed.  Well, kernel.symbols isn't the
> > kernel itself, but its symbols only.  OTOH, we need nothing but the
> > symbols.
> >
> > Unpack the snapshot's kernel source to somewhere.  This is as easy
> > as typing:
> >
> > 	cd /cdrom/7.0-CURRENT/src
> For the archives...
> You need to create the usr/src directory or tar will fail:
> mkdir -p /usr/home/me/somewhere/usr/src

Yes, you're quite right here!

> > 	env DESTDIR=/usr/home/me/somewhere sh install.sh sys
> >
> > And now load the kernel binary in GDB (not kgdb):
> >
> > 	gdb /cdrom/boot/kernel/kernel.symbols
> > 	(gdb) dir /usr/home/me/somewhere
> >
> > Perhaps GDB will find the source files more readily if you put them
> > just into /usr/src (after renaming the original /usr/src to, e.g.,
> > /usr/src.orig).  So you'll also prevent GDB from picking the wrong
> > source tree.
> >
> > 	mv /usr/src /usr/src.orig
> > 	mkdir /usr/src
> > 	cd /cdrom/7.0-CURRENT/src
> > 	sh install.sh sys
> > 	gdb /cdrom/boot/kernel/kernel.symbols
> >
> > Now you should be able to examine the source code using binary code
> > offsets:
> >
> > 	(gdb) list *(ad_describe+0x1b3)
> >
> > The "list" command will show you which line in which source file
> > is responsible for the division by zero, and 9 more lines around
> > it to provide a context.  The output can be shown here as is, it's
> > quite informative.
> (gdb) list *(ad_describe+0x1b3)
> 0xc04e224b is in ad_describe (/usr/src/sys/dev/ata/ata-disk.c:383).
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I suppose you put the CURRENT sources under /usr/src at last, didn't you?

> 378                       device_get_unit(ch->dev),
> 379                       (atadev->unit == 
> ATA_MASTER) ? "master" : "slave",
> 380                       (adp->flags & 
> AD_F_TAG_ENABLED) ? "tagged " : "",
> 381                       ata_mode2str(atadev->mode));
> 382         if (bootverbose) {
> 383             device_printf(dev, "%ju sectors [%juC/%dH/%dS] "
> 384                           "%d sectors/interrupt %d depth queue\n", 
> adp->total_secs,
> 385                           adp->total_secs / (adp->heads * 
> adp->sectors),
> 386                           adp->heads, adp->sectors, 
> atadev->max_iosize / DEV_BSIZE,
> 387                           adp->num_tags + 1);

Consequently, adp->heads or adp->sectors was 0 for ad0.  It means
that the ata(4) driver had some kind of trouble when reading the
disk's parameters from the ATA controller.  Now you may want to
contact the author of ata(4), Soren Schmidt <sos_at_freebsd.org>, for
further instructions on how to debug this problem.  I hope he'll
find all this info useful.  Thanks!

-- 
Yar
Received on Mon Jun 19 2006 - 20:51:54 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:57 UTC