On Wed, Dec 22, 2004 at 10:05:53PM +0100, Peter Holm wrote: > On Mon, Dec 20, 2004 at 06:41:04PM -0500, Bosko Milekic wrote: > > > > I realize it's been a while. > > > > Anyway, what I *think* is going on here is that slab_zalloc() is > > actually returning NULL even when called with M_WAITOK. Further > > inspection in slab_zalloc() reveals that this could come from several > > places. One of them is kmem_malloc() itself, which I doubt will ever > > return NULL if called with M_WAITOK. If this assumption is indeed > > correct, then the NULL must be being returned by slab_zalloc() itself, > > or due to a failed uma_zalloc_internal() call. It is also possible > > that slab_zalloc() returns NULL if the init that gets called for the > > zone fails. However, judging from the stack trace you provided, the > > init in question is mb_init_pack() (kern_mbuf.c). This particular > > init DOES perform an allocation and CAN in theory fail, but I believe > > it should be called with M_WAITOK as well, and so it should also never > > fail in theory. > > > > Have you gotten any further with the analysis of this particular > > trace? If not, I would suggest adding some more printf()s and > > analysis into slab_zalloc() itself, to see if that is indeed what is > > causing the infinite looping in uma_zone_slab() and, if so, attempt to > > figure out what part of slab_zalloc() is returning the NULL. > > OK, did that: http://www.holm.cc/stress/log/freeze03.html OK, well, I think I know what's happening. See if you can confirm this with me. I'll start with your trace and describe the analysis, please bear with me because it's long and painful. Your trace indicates that the NULL allocation failure, despite a call with M_WAITOK, is coming from slab_zalloc(). The particular thing that should also be mentionned about this trace, and your previous one, is that they both show a call path that goes through an init which performs an allocation, also with M_WAITOK. Currently, only the "packet zone" does this. It looks something like this: 1. UMA allocation is performed for a "packet." A "packet" is an mbuf with a pre-attached cluster. 2. UMA dips into the packet zone and finds it empty. Additionally, it determines that it is unable to get a bucket to fill up the zone (presumably there is a lot of memory request load). So it calls uma_zalloc_internal on the packet zone (frame 18). 3. Perhaps after some blocking, a slab is obtained from the packet zone's backing keg (which coincidentally is the same keg as the mbuf zone's backing keg -- let's call it the MBUF KEG). So now that an mbuf item is taken from the freshly allocated slab obtained from the MBUF KEG, uma_zalloc_internal() needs to init and ctor it, since it is about to return it to the top (calling) layer. It calls the initializer on it for the packet zone, mbuf_init_pack(). This corresponds to frame 17. 4. The packet zone's initializer needs to call into UMA again to get and attach an mbuf cluster to the mbuf being allocated, because mbufs residing within the packet zone (or obtained from the packet zone) MUST have clusters attached to them. It attempts to perform this allocation with M_WAITOK, because that's what the initial caller was calling with. This is frame 16. 5. Now the cluster zone is also completely empty and we can't get a bucket (surprise, surprise, the system is under high memory-request load). UMA calls uma_zalloc_internal() on the cluster zone as well. This is frame 15. 6. uma_zalloc_internal() calls uma_zone_slab(). Its job is to find a slab from the cluster zone's backing keg (a separate CLUSTER KEG) and return it. Unfortunately, memory-request load is high, so it's going to have a difficult time. The uma_zone_slab() call is frame 14. 7. uma_zone_slab() can't find a locally cached slab (hardly surprising, due to load) and calls slab_zalloc() to actually go to VM and get one. Before calling, it increments a special "recurse" flag so that we do not recurse on calling into the VM. This is because the VM itself might call back into UMA when it attempts to allocate vm_map_entries which could cause it to recurse on allocating buckets. This recurse flag is PER zone, and really only exists to protect the bucket zone. Crazy, crazy shit indeed. Pardon the language. This is frame 13. 8. Now slab_zalloc(), called for the CLUSTER zone, determines that the cluster zone (for space efficiency reasons) is in fact an OFFPAGE zone, so it needs to grab a slab header structure from a separate UMA "slab header" zone. It calls uma_zalloc_internal() from slab_zalloc(), but it calls it on the SLAB HEADER zone. It passes M_WAITOK down to it, but for some reason IT returns NULL and the failure is propagated back up which causes the uma_zone_slab() to keep looping. Go back to step 7. This is the infinite loop 7 -> 8 -> 7 -> 8 -> ... which you seem to have caught. The question now is why does the uma_zalloc_internal() fail on the SLAB HEADER zone, even though it is called with M_WAITOK. Unfortunately, your stack trace does not provide enough depth to be able to continue an accurate deductive analysis from this point on (you would need to sprinkle MORE KASSERTs). However, here are some hypotheses. The uma_zalloc_internal() which ends up getting called also ends up calling uma_zone_slab(), but uma_zone_slab() eventually fails (this is a fact, this is the only reason that the uma_zalloc_internal() could in turn fail for the SLAB HEADER zone, which doesn't have an init or a ctor). So why does the uma_zone_slab() fail with M_WAITOK on the slab header zone? Possibilities: 1. The recurse flag is at some point determined non-zero FOR THE SLAB HEADER backing keg. If the VM ends up getting called from the subsequent slab_zalloc() and ends up calling back into UMA for whatever allocations, and "whatever allocations" are also potentially offpage, and a slab header is ALSO required, then we could also be recursing on the slab header zone from VM, so this could cause the failure. if (keg->uk_flags & UMA_ZFLAG_INTERNAL && keg->uk_recurse != 0) { /* ADD PRINTF HERE */ printf("This zone: %s, forced fail due to recurse non-null", zone->uz_name); return NULL; } If you get the print to trigger right before the panic (last one before the panic), see if it is on the SLAB HEADER zone. In theory, it should only happen for the BUCKET ZONE. 2. M_WAITOK really isn't set. Unlikely. If (1) is really happening, we'll need to think about it a little more before deciding how to fix it. As you can see, due to the recursive nature of UMA/VM, things can get really tough when resources are scarce. Regards, -- Bosko Milekic bmilekic_at_technokratis.com bmilekic_at_FreeBSD.orgReceived on Wed Dec 22 2004 - 21:15:44 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:25 UTC