On Wed, Dec 22, 2004 at 05:15:40PM -0500, Bosko Milekic wrote: > > On Wed, Dec 22, 2004 at 10:05:53PM +0100, Peter Holm wrote: > > On Mon, Dec 20, 2004 at 06:41:04PM -0500, Bosko Milekic wrote: > > > > > > I realize it's been a while. > > > > > > Anyway, what I *think* is going on here is that slab_zalloc() is > > > actually returning NULL even when called with M_WAITOK. Further > > > inspection in slab_zalloc() reveals that this could come from several > > > places. One of them is kmem_malloc() itself, which I doubt will ever > > > return NULL if called with M_WAITOK. If this assumption is indeed > > > correct, then the NULL must be being returned by slab_zalloc() itself, > > > or due to a failed uma_zalloc_internal() call. It is also possible > > > that slab_zalloc() returns NULL if the init that gets called for the > > > zone fails. However, judging from the stack trace you provided, the > > > init in question is mb_init_pack() (kern_mbuf.c). This particular > > > init DOES perform an allocation and CAN in theory fail, but I believe > > > it should be called with M_WAITOK as well, and so it should also never > > > fail in theory. > > > > > > Have you gotten any further with the analysis of this particular > > > trace? If not, I would suggest adding some more printf()s and > > > analysis into slab_zalloc() itself, to see if that is indeed what is > > > causing the infinite looping in uma_zone_slab() and, if so, attempt to > > > figure out what part of slab_zalloc() is returning the NULL. > > > > OK, did that: http://www.holm.cc/stress/log/freeze03.html > > OK, well, I think I know what's happening. See if you can confirm > this with me. > > I'll start with your trace and describe the analysis, please bear with > me because it's long and painful. > > Your trace indicates that the NULL allocation failure, despite a call > with M_WAITOK, is coming from slab_zalloc(). The particular thing > that should also be mentionned about this trace, and your previous > one, is that they both show a call path that goes through an init > which performs an allocation, also with M_WAITOK. Currently, only the > "packet zone" does this. It looks something like this: > > 1. UMA allocation is performed for a "packet." A "packet" is an mbuf > with a pre-attached cluster. > > 2. UMA dips into the packet zone and finds it empty. Additionally, it > determines that it is unable to get a bucket to fill up the zone > (presumably there is a lot of memory request load). So it calls > uma_zalloc_internal on the packet zone (frame 18). > > 3. Perhaps after some blocking, a slab is obtained from the packet > zone's backing keg (which coincidentally is the same keg as the > mbuf zone's backing keg -- let's call it the MBUF KEG). So now > that an mbuf item is taken from the freshly allocated slab obtained > from the MBUF KEG, uma_zalloc_internal() needs to init and ctor it, > since it is about to return it to the top (calling) layer. It > calls the initializer on it for the packet zone, mbuf_init_pack(). > This corresponds to frame 17. > > 4. The packet zone's initializer needs to call into UMA again to get > and attach an mbuf cluster to the mbuf being allocated, because mbufs > residing within the packet zone (or obtained from the packet zone) > MUST have clusters attached to them. It attempts to perform this > allocation with M_WAITOK, because that's what the initial caller > was calling with. This is frame 16. > > 5. Now the cluster zone is also completely empty and we can't get a > bucket (surprise, surprise, the system is under high memory-request > load). UMA calls uma_zalloc_internal() on the cluster zone as well. > This is frame 15. > > 6. uma_zalloc_internal() calls uma_zone_slab(). Its job is to find a > slab from the cluster zone's backing keg (a separate CLUSTER KEG) > and return it. Unfortunately, memory-request load is high, so it's > going to have a difficult time. The uma_zone_slab() call is frame > 14. > > 7. uma_zone_slab() can't find a locally cached slab (hardly > surprising, due to load) and calls slab_zalloc() to actually go to > VM and get one. Before calling, it increments a special "recurse" > flag so that we do not recurse on calling into the VM. This is > because the VM itself might call back into UMA when it attempts to > allocate vm_map_entries which could cause it to recurse on > allocating buckets. This recurse flag is PER zone, and really only > exists to protect the bucket zone. Crazy, crazy shit indeed. > Pardon the language. This is frame 13. > > 8. Now slab_zalloc(), called for the CLUSTER zone, determines that the > cluster zone (for space efficiency reasons) is in fact an OFFPAGE > zone, so it needs to grab a slab header structure from a separate > UMA "slab header" zone. It calls uma_zalloc_internal() from > slab_zalloc(), but it calls it on the SLAB HEADER zone. It passes > M_WAITOK down to it, but for some reason IT returns NULL and the > failure is propagated back up which causes the uma_zone_slab() to > keep looping. Go back to step 7. > > This is the infinite loop 7 -> 8 -> 7 -> 8 -> ... which you seem to > have caught. > > The question now is why does the uma_zalloc_internal() fail on the > SLAB HEADER zone, even though it is called with M_WAITOK. > Unfortunately, your stack trace does not provide enough depth to be > able to continue an accurate deductive analysis from this point on > (you would need to sprinkle MORE KASSERTs). > > However, here are some hypotheses. > > The uma_zalloc_internal() which ends up getting called also ends up > calling uma_zone_slab(), but uma_zone_slab() eventually fails (this is > a fact, this is the only reason that the uma_zalloc_internal() could > in turn fail for the SLAB HEADER zone, which doesn't have an init or a > ctor). > > So why does the uma_zone_slab() fail with M_WAITOK on the slab header > zone? Possibilities: > > 1. The recurse flag is at some point determined non-zero FOR THE SLAB > HEADER backing keg. If the VM ends up getting called from the > subsequent slab_zalloc() and ends up calling back into UMA for > whatever allocations, and "whatever allocations" are also > potentially offpage, and a slab header is ALSO required, then we > could also be recursing on the slab header zone from VM, so this > could cause the failure. > > if (keg->uk_flags & UMA_ZFLAG_INTERNAL && keg->uk_recurse != 0) { > /* ADD PRINTF HERE */ > printf("This zone: %s, forced fail due to recurse non-null", > zone->uz_name); > return NULL; > } > > If you get the print to trigger right before the panic (last one > before the panic), see if it is on the SLAB HEADER zone. In > theory, it should only happen for the BUCKET ZONE. Yes, I think that I have verified your exelent analysis of the problem: http://www.holm.cc/stress/log/freeze04.html So, do have any fix suggenstons? :-) > > 2. M_WAITOK really isn't set. Unlikely. > > If (1) is really happening, we'll need to think about it a little more > before deciding how to fix it. As you can see, due to the recursive > nature of UMA/VM, things can get really tough when resources are > scarce. > > Regards, > -- > Bosko Milekic > bmilekic_at_technokratis.com > bmilekic_at_FreeBSD.org -- Peter HolmReceived on Sun Dec 26 2004 - 15:12:01 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:25 UTC