I've been working on a set of patches to remove the sysctl variable creation from interrupt context in the cd(4) and da(4) drivers. To fix the problem, I've created a new taskqueue that runs in a thread context, instead of inside a software interrupt like the current task queues. (The eventual fix will involve moving the CAM probe inside a thread; this will provide a more temporary solution that will hopefully also work on -stable, until we can change the CAM probe code.) I think I have everything setup correctly, but I keep getting panics inside the GEOM code with these patches. (Memory modified after free.) I don't know whether I've just exposed some race condition, or whether I've done something wrong. I've seen several different panics, all with the same root cause (memory modified after free), and with two different previous memory pools -- geom and devbuf. ========================================================================== SMP: AP CPU #1 Launched! Memory modified after free 0xcbd4f800(124) panic: Most recently used by GEOM cpuid = 0; lapic.id = 00000000 Debugger("panic") Stopped at Debugger+0x55: xchgl %ebx,in_Debugger.0 db> trace Debugger(c03e974d,0,c03fb4d8,e5e45934,100) at Debugger+0x55 panic(c03fb4d8,c03e55db,7c,c083ac14,c083ac00) at panic+0x15f mtrash_ctor(cbd4f800,80,0,57a,cbd4f800) at mtrash_ctor+0x5d uma_zalloc_arg(c083ac00,0,102,e5e4599c,e5e4599c) at uma_zalloc_arg+0x1e4 malloc(5b,c042fae0,102,1,c02756e4) at malloc+0xd3 g_new_providerf(cbda62c0,cbd7b130,e5e45a3c,1,1) at g_new_providerf+0xa3 g_slice_config(cbda62c0,2,1,0,0) at g_slice_config+0x259 g_bsd_modify(cbda62c0,cbd7712c,e5e45c8c,10,cbd77000) at g_bsd_modify+0x382 g_bsd_taste(c0470480,cbda5780,0,159,cbda5700) at g_bsd_taste+0x2c4 g_new_provider_event(cbda5780,0,c03e52b7,b3,66666667) at g_new_provider_event+0xad one_event(e5e45d0c,c021cd85,c0485fb4,0,4c) at one_event+0x218 g_run_events(c0485fb4,0,4c,c03d7a28,a) at g_run_events+0x15 g_event_procbody(0,e5e45d48,c03e738b,314,34df1a6d) at g_event_procbody+0x45 fork_exit(c021cd40,0,e5e45d48) at fork_exit+0xcf fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe5e45d7c, ebp = 0 --- db> panic ========================================================================== SMP: AP CPU #1 Launched! Memory modified after free 0xcbd4f600(124) panic: Most recently used by devbuf cpuid = 0; lapic.id = 00000000 Debugger("panic") Stopped at Debugger+0x55: xchgl %ebx,in_Debugger.0 db> trace Debugger(c03e974d,0,c03fb4d8,e5e45af0,100) at Debugger+0x55 panic(c03fb4d8,c03e7f50,7c,c083ac14,c083ac00) at panic+0x15f mtrash_ctor(cbd4f600,80,0,57a,cbd4f600) at mtrash_ctor+0x5d uma_zalloc_arg(c083ac00,0,102,e5e45b58,e5e45b58) at uma_zalloc_arg+0x1e4 malloc(5a,c042fae0,102,1,c02756e4) at malloc+0xd3 g_new_providerf(cbda74c0,cb9b0d90,e5e45bf8,1,1) at g_new_providerf+0xa3 g_slice_config(cbda74c0,0,1,7e00,0) at g_slice_config+0x259 g_mbr_modify(cbda74c0,cb9d3800,cbd5b200,123,0) at g_mbr_modify+0x247 g_mbr_taste(c0470560,cbd4ee80,0,159,cbd4f580) at g_mbr_taste+0x1be g_new_provider_event(cbd4ee80,0,c03e52b7,b3,66666667) at g_new_provider_event+0xad one_event(e5e45d0c,c021cd85,c0485fb4,0,4c) at one_event+0x218 g_run_events(c0485fb4,0,4c,c03d7a28,a) at g_run_events+0x15 g_event_procbody(0,e5e45d48,c03e738b,314,34df1a6d) at g_event_procbody+0x45 fork_exit(c021cd40,0,e5e45d48) at fork_exit+0xcf fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe5e45d7c, ebp = 0 --- db> panic ========================================================================== Memory modified after free 0xcbd4f600(124) panic: Most recently used by devbuf cpuid = 0; lapic.id = 00000000 Debugger("panic") Stopped at Debugger+0x55: xchgl %ebx,in_Debugger.0 db> trace Debugger(c03e974d,0,c03fb4d8,e5e45bbc,100) at Debugger+0x55 panic(c03fb4d8,c03e7f50,7c,c083ac14,c083ac00) at panic+0x15f mtrash_ctor(cbd4f600,80,0,57a,cbd4f600) at mtrash_ctor+0x5d uma_zalloc_arg(c083ac00,0,102,fe,cbd6b800) at uma_zalloc_arg+0x1e4 malloc(60,c042fae0,102,cbd59240,c0470560) at malloc+0xd3 g_slice_alloc(4,214,cbd4f4d4,1c2,e5e45c9c) at g_slice_alloc+0x7e g_slice_new(c0470560,4,cbd4f480,e5e45c98,e5e45c9c) at g_slice_new+0x6f g_mbr_taste(c0470560,cbd4f480,0,159,cbd4f580) at g_mbr_taste+0x90 g_new_provider_event(cbd4f480,0,c03e52b7,b3,66666667) at g_new_provider_event+0xad one_event(e5e45d0c,c021cd85,c0485fb4,0,4c) at one_event+0x218 g_run_events(c0485fb4,0,4c,c03d7a28,a) at g_run_events+0x15 g_event_procbody(0,e5e45d48,c03e738b,314,34df1a6d) at g_event_procbody+0x45 fork_exit(c021cd40,0,e5e45d48) at fork_exit+0xcf fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe5e45d7c, ebp = 0 --- db> ========================================================================== SMP: AP CPU #1 Launched! Memory modified after free 0xcbd4f600(124) panic: Most recently used by devbuf cpuid = 0; lapic.id = 00000000 Debugger("panic") Stopped at Debugger+0x55: xchgl %ebx,in_Debugger.0 db> trace Debugger(c03e974d,0,c03fb4d8,e5e45aa8,100) at Debugger+0x55 panic(c03fb4d8,c03e7f50,7c,c083ac14,c083ac00) at panic+0x15f mtrash_ctor(cbd4f600,80,0,57a,cbd4f600) at mtrash_ctor+0x5d uma_zalloc_arg(c083ac00,0,102,5c,cbd4f900) at uma_zalloc_arg+0x1e4 malloc(64,c042fae0,102,cbd4f900,2) at malloc+0xd3 g_post_event_x(c021ea70,cbd4f900,2,0,e5e45b6c) at g_post_event_x+0x54 g_post_event(c021ea70,cbd4f900,2,cbd4f900,0) at g_post_event+0x45 g_new_providerf(cbda3540,cb9b0b20,e5e45bf8,1,1) at g_new_providerf+0x151 g_slice_config(cbda3540,0,1,7e00,0) at g_slice_config+0x259 g_mbr_modify(cbda3540,cbd6c400,cbd73000,123,0) at g_mbr_modify+0x247 g_mbr_taste(c0470560,cbd4f700,0,159,cbd4f780) at g_mbr_taste+0x1be g_new_provider_event(cbd4f700,0,c03e52b7,b3,66666667) at g_new_provider_event+0xad one_event(e5e45d0c,c021cd85,c0485fb4,0,4c) at one_event+0x218 g_run_events(c0485fb4,0,4c,c03d7a28,a) at g_run_events+0x15 g_event_procbody(0,e5e45d48,c03e738b,314,34df1a6d) at g_event_procbody+0x45 fork_exit(c021cd40,0,e5e45d48) at fork_exit+0xcf fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe5e45d7c, ebp = 0 --- ========================================================================== Since the panics involved either M_DEVBUF or M_GEOM, I removed all M_DEVBUF mallocs from the cd(4) and da(4) drivers. (None of the other affected code used M_DEVBUF; I created new malloc types for the cd(4) and da(4) drivers.) The problem didn't change. (Other than the exact place in GEOM that triggered the malloc that caught the problem.) Anyway, I've attached the patch in question. If someone could tell me what (if anything) I'm doing wrong, I'd appreciate it! Thanks, Ken -- Kenneth Merry ken_at_kdm.org
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:20 UTC