Hi Glebius, On Tuesday 29 November 2011 15:28:42 Gleb Smirnoff wrote: > Daan, > > On Tue, Nov 29, 2011 at 01:07:13AM +0100, Daan Vreeken wrote: > D> Thanks for the looking into this and for your quick commit. I like your > D> twist on the patch with the move from the unit bitmap to allocating unit > D> numbers with alloc_unr(9). > D> > D> I do have two comments on the new code though. > D> Before, in the !wildcard case, ifc_alloc_unit() would return ENOSPC when > D> the user requested a unit number larger than ifc->ifc_maxunit. Now the > D> function returns EEXIST in that case because alloc_unr_specific() > D> returns -1 both when a number is already allocated and when the number > D> exceeds it's limits. This leads to the somewhat confusing: > D> > D> # ifconfig bridge123456 create > D> ifconfig: SIOCIFCREATE2: File exists > D> > D> Apart from that I'm a bit worried about the "/* XXXGL: yep, it's a unit > D> leak */" part of your change. Once a unit number is leaked, there > D> currently seems to be no way to ever get it back. In a worst case > D> scenario, where the names of multiple interfaces in a row collides with > D> the numbers alloc_unr() returns, an entire row of unit numbers could be > D> leaked during the creation of a single cloned interface. > D> We have a product where cloned interfaces come and go very frequently. > D> I'd hate to run out of unit numbers after 'just a few' years of uptime > D> ;-) > D> I've created a new patch on top of your change that fixes both of these > D> problems. Could you have a look at the attached diff? > > Thanks, I will work on applying it. Great! > D> > Considering the second part, that adds locking. Unfortunately, right > D> > now we have numerous races in the network configuration ocde. Many > D> > SIOCSsomething ioctls can race with each other producing unpredictable > D> > results and kernel panics. So, running two ifconfig(8) in parallel is > D> > a bad idea today. :( > D> > Your patch with IFNET_NAMING_LOCK() just plumbs one race case: a race > D> > between two SIOCSIFNAME ioctls. And it doesn't plumb a race between > D> > SIOCSIFNAME vs SIOCIFCREATE, because IFNET_NAMING_LOCK() is dropped > D> > after unit allocation, but prior to interface attachement to global > D> > interface list. > D> > D> Are you sure? With the patch in the PR, the relevant code in > D> ifc_simple_create() would become : > D> > D> ... > D> IFNET_NAMING_LOCK(); > D> err = ifc_alloc_unit(ifc, &unit); > D> if (err != 0) { > D> IFNET_NAMING_UNLOCK(); > D> return (err); > D> } > D> > D> err = ifcs->ifcs_create(ifc, unit, params); > D> IFNET_NAMING_UNLOCK(); > D> if (err != 0) { > D> ifc_free_unit(ifc, unit); > D> return (err); > D> } > D> ... > D> > D> The lock is acquired before the call to ifc_alloc_unit(). > D> In the non-error case (e.g. when creating an if_bridge interface) the > D> call ends up calling bridge_clone_create(), which calls > D> ether_ifattach(), which calls if_attach() -> if_attach_internal() where > D> the ifp is added to the ifnet list. > D> Only when that's done, the lock is dropped. > D> > D> Am I overlooking something obvious here? > > The code above isn't correct since it holds mutex when calling > ifcs->ifcs_create(). These methods may (they actually do) call malloc() > with M_WAITOK. I'd noticed the malloc() too (see my comment in the PR: "We can't use IFNET_WLOCK() here since the code paths may sleep.") The IFNET_WLOCK() macro is defined as: #define IFNET_WLOCK() do { \ sx_xlock(&ifnet_sxlock); \ rw_wlock(&ifnet_rwlock); \ } while (0) Since rwlocks cannot be held while sleeping, I've added the IFNET_NAMING_ macros in the patch, defining them as : #define IFNET_NAMING_LOCK() sx_xlock(&ifnet_sxlock); #define IFNET_NAMING_UNLOCK() sx_unlock(&ifnet_sxlock); This only does the first half of the work IFNET_WLOCK() normally does. My understanding from reading the sx(9) manual is that holding an sx lock while sleeping should be allowed. (WITNESS and INVARIANTS didn't seem to disagree with me. :) After acquiring the ifnet_sxlock, malloc() should be allowed to sleep while allocating memory for the new interface. When it's done sleeping, the interface can be added to the ifnet list in if_attach(). if_attach() will IFNET_WLOCK() which should recurse on the ifnet_sxlock and acquire a lock on the ifnet_rwlock. (I'm still confused as to why this wouldn't work.) > In general FreeBSD uses M_WAITOK on the configuration code paths, like > any SIOCSsomething ioctl. So to do correct protection, you first need to > allocate every kind of memory needed, then lock mutex, then run through > configuration sequence, then release mutex and in case of error free all > allocated memory. Sounds easy, but isn't in practice, especially when > several network modules are involved. > > So I'm still thinking about... > > D> > From my point of view, we need a generic approach to ioctl() vs > D> > ioctl() races, may be some global serializer of all re-configuration > D> > requests of interfaces and addresses. > > ... but several developers have noted that this approach may have some > hidden problems. When experimenting with new CARP, I have tried it on the > carp_ioctl() without any problems. The idea is simple: > > static int serializer = 0; > static struct mtx serializer_mtx; > MTX_SYSINIT("sermtx", &serializer_mtx, "ioctl serializer mutex", MTX_DEF); > > int > foo_ioctl() > { > mtx_lock(&serializer_mtx); > if (serializer != 0) > msleep(&serializer, &serializer_mtx, 0, "ioctl", 0); > serializer = 1; > mtx_unlock(&serializer_mtx); > > ... any code here, uncluding calls to different lower layers... > > mtx_lock(&serializer_mtx); > serializer = 0; > wakeup(&serializer); > mtx_unlock(&serializer_mtx); > > return (error); > } > > I have tried this for carp_ioctl() and it worked fine. You can try it for > entire ifioctl() and all its descendants, but be aware of hidden problems > :) That reminds me of sem_wait() and sem_post(). I'm going to remember that construction. It could come in handy some time. :) Regards, -- Daan Vreeken Vitsch Electronics http://Vitsch.nl tel: +31-(0)40-7113051 / +31-(0)6-46210825 KvK nr: 17174380Received on Tue Nov 29 2011 - 23:20:40 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:21 UTC