Re: kgdb(1) ... is it broken ?

From: Kostik Belousov <kostikbel_at_gmail.com> Date: Fri, 23 Feb 2007 13:34:39 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:05 UTC

On Fri, Feb 23, 2007 at 03:18:23PM +0900, Wilkinson, Alex wrote:
> Hi all,
> 
> I have a reasonably recent version of current that is panic'ing at least once
> every 2 days. When I run kgdb(1) to do a backtrace it aint working correctly.
> 
> [FreeBSD 7.0-CURRENT #0: Wed Jan 24 14:24:54 WST 2007]
> 
> e.g.
> 
> The panic:
> 
> 	NVRM: Xid (0001:00): 8, Channel 00000000
> 	panic: Bad link elm 0xc4dc8900 next->prev != elm
> 	cpuid = 0
> 	KDB: enter: panic
> 	[thread pid 909 tid 100080 ]
> 	Stopped at      kdb_enter+0x32: leave
> 	db>tr
> 	Tracing pid 909 tid 100080 td 0xc47231b0
> 	kdb_enter(c09ecabf,0,c09a4b15,e6a69a20,c47231b0,...) at kdb_enter+0x32
> 	panic(c09a4b15,c4dc8900,4c,c09e8778,64,...) at panic+0x191
> 	destroy_devl(c4714e80,e6a69a70,c0fe6cf0,c4dc8900,40,...) at destroy_devl+0x330
> 	destroy_dev(c4dc8900,40,c47231b0,0,c4dc8900,...) at destroy_dev+0x13
> 	nvidia_dev_close(c4dc8900,3,2000,c47231b0,c4e287d8,...) at nvidia_dev_close+0xa4
> 	
> 	giant_close(c4dc8900,3,2000,c47231b0,e6a69adc,...) at giant_close+0x4f
> 	devfs_close(e6a69b28,3,c4e28754) at devfs_close+0x2d1
> 	VOP_CLOSE_APV(c0a8de20,e6a69b28,c47231b0,c09f7b4c,11f,...) at VOP_CLOSE_APV+0x69
> 	
> 	vn_close(c4e28754,3,c4306a80,c47231b0,203246,...) at vn_close+0x99
> 	vn_closefile(c4bf0a20,c47231b0,c09e9165,889,c4e28754,...) at vn_closefile+0x88
> 	fdrop_locked(c4bf0a20,c47231b0,2,c09ee59f,de,c47231b0,0,203246,c0b3b920,e6a69c24
> 	,c07517fb,c0af5494,0,c4b3522c,401,c09e9165,e6a69c4c,c0716a82,c4b3522c,1,c09ebc01
> 	,ae,0) at fdrop_locked+0xb9
> 	closef(c4bf0a20,c47231b0,c09e9165,401,c0739bd6,...) at closef+0x1f4
> 	kern_close(c47231b0,e,4,c4b346c0,1,...) at kern_close+0x188
> 	syscall(e6a69d38) at syscall+0x155
> 	Xint0x80_syscall() at Xint0x80_syscall+0x20
> 	--- syscall (0, FreeBSD ELF32, nosys), eip = 0x2, esp = 0x203292, ebp = 0xc1d000
> 	01 ---
> 	MAXCPU(4000000,90ffff00,10c19ee7,58c28e8c,34c22fbb,...) at 0x2
> 	db>panic
> 	panic: from debugger
> 	cpuid = 0
> 	Uptime: 3d5h29m19s
> 	Physical memory: 1007 MB
> 	Dumping 219 MB: 204 188 172 156 140 124 108 92 76 60 44 28 12
> 	Dump complete
> 
> Upon a reboot I see this error:
> 
> 	savecore: reboot after panic: Bad link elm 0xc4dc8900 next->prev != elm
> 	Feb 23 15:02:22 obelix savecore: reboot after panic: Bad link elm 0xc4dc8900 next->prev != elm
> 
> And then the backtrace:
> 
> 	#0  doadump () at pcpu.h:166
> 	166     pcpu.h: No such file or directory.
> 	        in pcpu.h
> 	(kgdb) where
> 	#0  doadump () at pcpu.h:166
> 	#1  0xc0720c1b in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:411
> 	#2  0xc0720693 in panic (fmt=0xc09ab848 "from debugger") at
> 	/usr/src/sys/kern/kern_shutdown.c:567
> 	#3  0xc047e490 in db_panic (addr=-1066121253, have_addr=0, count=-1,
> 	modif=0xe6a69810 "") at /usr/src/sys/ddb/db_command.c:433
> 	#4  0xc047e870 in db_command_loop () at /usr/src/sys/ddb/db_command.c:401
> 	#5  0xc04805fb in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:222
> 	#6  0xc0744c19 in kdb_trap (type=0, code=0, tf=0xe6a699a4) at
> 	/usr/src/sys/kern/subr_kdb.c:502
> 	#7  0xc0960ea5 in trap (frame=0xe6a699a4) at /usr/src/sys/i386/i386/trap.c:621
> 	#8  0xc0948dbb in calltrap () at /usr/src/sys/i386/i386/exception.s:139
> 	#9  0x00000000 in ?? ()
> 	(kgdb) 
> 
> Things just aint working as per normal.
> 
> Has anyone had problems with running backtraces of kernel core dumps with kgdb(1) ?

Try this patch, it shall allow to see useful backtrace in kgdb (I really
like to receive feedback on this one):

Index: gnu/usr.bin/gdb/kgdb/trgt_i386.c
===================================================================
RCS file: /usr/local/arch/ncvs/src/gnu/usr.bin/gdb/kgdb/trgt_i386.c,v
retrieving revision 1.5
diff -u -r1.5 trgt_i386.c
--- gnu/usr.bin/gdb/kgdb/trgt_i386.c	11 Sep 2005 05:36:30 -0000	1.5
+++ gnu/usr.bin/gdb/kgdb/trgt_i386.c	23 Feb 2007 11:31:39 -0000
_at__at_ -146,7 +146,7 _at__at_
 	*realnump = -1;

 	ofs = (regnum >= I386_EAX_REGNUM && regnum <= I386_FS_REGNUM)
-	    ? kgdb_trgt_frame_offset[regnum] : -1;
+	    ? kgdb_trgt_frame_offset[regnum] + 4 : -1;
 	if (ofs == -1)
 		return;

BTW, you panic is caused by nvidia driver. I believe there is a patch by
nvidia that would eliminate the problem.