Re: Panics after AHCI timeouts

From: Alexander Motin <mav_at_FreeBSD.org> Date: Tue, 18 Oct 2011 18:40:17 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:19 UTC

Hi.

Alexey Shuvaev wrote:
> On Sat, Oct 08, 2011 at 10:14:56PM +0200, Alexey Shuvaev wrote:
> Errr... Replying to myself... Ping? Should I file a PR and put it
> in the back burner? :)

Sorry for not replying, I wasn't home to look on it closely.

>> In the view of upcoming RELEASE-9.0 I should have reported it earlier,
>> but it is better later than never... Every time I wanted to report
>> this, the system was ~one month old and I tried to upgrade it
>> to see, if the problem was still there, waiting for the next panic...
>> and when it finally paniced it was one month old again.
>>
> [snip]
>> >From core.txt.5:
>> [snip]
>> Unread portion of the kernel message buffer:
>> Memory modified after free 0xfffffe000416e200(248) val=79e8800 _at_ 0xfffffe000416e200
>> panic: Most recently used by cred
>>
>> cpuid = 2
>> Uptime: 20h11m1s
>> Dumping 1308 out of 7914 MB:..2%..12%..21%..31%..41%..51%..62%..71%..81%..91%
>> [snip]
>> #0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:252
>> 252             if (textdump && textdump_pending) {
>> (kgdb) #0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:252
>> #1  0xffffffff808234aa in kern_reboot (howto=260)
>>     at /usr/src/sys/kern/kern_shutdown.c:430
>> #2  0xffffffff80822f41 in panic (fmt=Variable "fmt" is not available.
>> )
>>     at /usr/src/sys/kern/kern_shutdown.c:595
>> #3  0xffffffff80a6f7b4 in mtrash_ctor (mem=Variable "mem" is not available.
>> ) at /usr/src/sys/vm/uma_dbg.c:137
>> #4  0xffffffff80a6f01c in uma_zalloc_arg (zone=0xfffffe021ffe0700, udata=0x0, 
>>     flags=258) at /usr/src/sys/vm/uma_core.c:2018
>> #5  0xffffffff808108be in malloc (size=Variable "size" is not available.
>> ) at uma.h:305
>> #6  0xffffffff8081c21f in crget () at /usr/src/sys/kern/kern_prot.c:1809
>> #7  0xffffffff8081c269 in crdup (cr=0xfffffe0143103300)
>>     at /usr/src/sys/kern/kern_prot.c:1911
>> #8  0xffffffff808c5ca6 in kern_accessat (td=0xfffffe0007dd7000, fd=-100, 
>>     path=0x80065c000 <Address 0x80065c000 out of bounds>, 
>>     pathseg=UIO_USERSPACE, flags=Variable "flags" is not available.
>> ) at /usr/src/sys/kern/vfs_syscalls.c:2201
>> #9  0xffffffff8086719a in syscallenter (td=0xfffffe0007dd7000, 
>>     sa=0xffffff8223f67bb0) at /usr/src/sys/kern/subr_trap.c:344
>> #10 0xffffffff80b0b43c in syscall (frame=0xffffff8223f67c50)
>>     at /usr/src/sys/amd64/amd64/trap.c:910
>> #11 0xffffffff80af617d in Xfast_syscall ()
>>     at /usr/src/sys/amd64/amd64/exception.S:384
>> #12 0x000000080062dbdc in ?? ()
>> Previous frame inner to this frame (corrupt stack?)
>> [snip]
>> [last message in dmesg]
>> ahcich0: Timeout on slot 29 port 0
>> ahcich0: is 00000000 cs 00000000 ss ffffffff rs ffffffff tfd 40 serr 00000000 cm
>> d 0000fc17
>> [snip]

Now looking on two you backtraces I don't see anything common between
them. While first crash happened within timer event handler, it was not
AHCI-related event. Second crash happened inside some unrelated syscall.
I may suppose that some memory corruption could cause both, but I have
no idea what it is and how can it be related to AHCI. With the same
effect I could tell that some other hardware problem causes both
problems. Try to collect more statistics.

-- 
Alexander Motin