Re: page fault in igb driver on 8.0-RC1

From: Pyun YongHyeon <pyunyh_at_gmail.com>
Date: Sat, 17 Oct 2009 15:23:14 -0700
On Sat, Oct 17, 2009 at 08:03:51PM +0300, Mykola Dzham wrote:
> Hi!
> On hight network load system panics:
> 
> GET BUF: dmamap load failure - 12
> GET BUF: dmamap load failure - 12
> GET BUF: dmamap load failure - 12
> GET BUF: dmamap load failure - 12
> GET BUF: dmamap load failure - 12
> GET BUF: dmamap load failure - 12
> GET BUF: dmamap load failure - 12
> GET BUF: dmamap load failure - 12
> GET BUF: dmamap load failure - 12
> GET BUF: dmamap load failure - 12
> GET BUF: dmamap load failure - 12
> GET BUF: dmamap load failure - 12
> GET BUF: dmamap load failure - 12
> GET BUF: dmamap load failure - 12
> GET BUF: dmamap load failure - 12
> 

I believe this type of message should not be in fast path and it
should be rate-limited.

> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 2; apic id = 02
> fault virtual address   = 0x0
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff8025e4a5
> stack pointer           = 0x28:0xffffff87312f3a60
> frame pointer           = 0x28:0xffffff87312f3a80
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 0 (igb0 taskq)
> trap number             = 12
> panic: page fault
> cpuid = 2
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0xffffffff80185baa =
> db_trace_self_wrapper+0x2a
> panic() at 0xffffffff8020e992 = panic+0x182
> trap_fatal() at 0xffffffff8040eefd = trap_fatal+0x2ad
> trap_pfault() at 0xffffffff8040f27d = trap_pfault+0x22d
> trap() at 0xffffffff8040fbff = trap+0x3cf
> calltrap() at 0xffffffff803f6e13 = calltrap+0x8
> --- trap 0xc, rip = 0xffffffff8025e4a5, rsp = 0xffffff87312f3a60, rbp =
> 0xffffff87312f3a80 ---
> mb_free_ext() at 0xffffffff8025e4a5 = mb_free_ext+0x15
> igb_get_buf() at 0xffffffff80a3a6e5 = igb_get_buf+0x2e5
> igb_rxeof() at 0xffffffff80a3abd5 = igb_rxeof+0x425
> igb_handle_rx() at 0xffffffff80a3b14b = igb_handle_rx+0x3b
> taskqueue_run() at 0xffffffff80243ec1 = taskqueue_run+0x91
> taskqueue_thread_loop() at 0xffffffff8024404f =
> taskqueue_thread_loop+0x3f
> fork_exit() at 0xffffffff801ea9b2 = fork_exit+0x112
> fork_trampoline() at 0xffffffff803f72ee = fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xffffff87312f3d30, rbp = 0 ---
> Uptime: 1h46m18s
> Cannot dump. Device not defined or unavailable.
> 
> System is amd64 8.0-RC1 r197974 Tue Oct 13 23:00:17 EEST 2009
> 

Hmm, I remember some user already reported similar issues for
igb(4).
At that time I briefly looked over possible code paths for the
issue and saw some questionable handling of mbufs under resource
shortage cases and I sent my concerns to Jack but it seems he lost
the mail.
Unfortunately I don't have igb(4) hardwares so I guess it's
somewhat hard for me to fix the issue but I'll try to read the code
again if time permit.
Received on Sat Oct 17 2009 - 20:23:21 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:57 UTC