Bug in recent large_alloc changes to the ZFS zio code?

From: Richard Todd <rmtodd_at_ichotolot.servalan.com>
Date: Sun, 31 May 2009 01:20:45 -0500
Okay, I'm looking at the recent changes in the ZFS zio code to change how
data buffers are allocated (svn r192207).  The old code for 
zio_data_buf_alloc just called kmem_alloc (the Solaris compatibility
one), which in turn called malloc() with M_WAITOK, so it would always 
be guaranteed of getting a valid, non-null pointer.  Fair enough.
The new code has an alternate code path, where in "arc_large_memory_enabled"
mode, it calls the new function zio_large_malloc instead.  zio_large_malloc
in turn tries a few times to allocate the required pages using
vm_phys_alloc_contig, but if that fails goes ahead and returns NULL.

Here's the problem.  As near as I can tell, none of the code that calls 
zio_data_buf_alloc appears to check for the possibility that the
returned pointer could be NULL, which I guess is reasonable as the original
code never could return NULL.  However, the new large malloc code *can* return
NULL, which causes the obvious problem.  The other day I mentioned here a 
panic I saw where under sufficiently heavy load the GEOM code was
complaining that it had been given a NULL data pointer.  It seems to me that
that was likely because zio had tried to allocate a data buffer and gotten
a NULL pointer instead.  
Received on Sun May 31 2009 - 04:45:18 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:48 UTC