Robert Watson wrote: > > On Tue, 17 Apr 2007, Robert Watson wrote: > >>> I originally put it in there to work around a LOR that I was >>> experiencing (based on you mentioning it in an email to current_at_ Sun >>> 18 Mar 2007 15:50). http://sources.zabbadoz.net/freebsd/lor/191.html >>> doesn't show any changes to that particular LOR, do you happen to >>> know if there's any ongoing work on this? I'm very willing to act as >>> a test system. >> >> I chatted with Andre about the panic earlier this afternoon, and it >> sounds like the fix is straight forward. I would anticipate seeing it >> committed in the near future. >> >> I'll send out an e-mail explaining the above lock order reversal >> tomorrow morning. I understand that several people have been looking >> at this, so perhaps one of those people will reply talking about it >> before then. :-) > > The essential problem of this lock order reversal has to do with the > fact that higher network stack layers hold locks over lower network > stack layers. For example, the lock for a TCP connection is held over > the operation to enqueue the TCP packet for transmission at a lower > layer. This is necessary in order to maintain TCP transmission order > into the transmission queue between multiple threads operating on the > same TCP connection, as if the "transmit and enqueue" operation were > non-atomic with respect to the same TCP connection in another thread, > quite damaging reordering could take place. We directly dispatch the > entire outbound network stack from that enqueue point, meaning that the > per-TCP connection lock is held over that processing path, including the > firewall. As a result, PCB locks (TCP connection locks) preceed the > firewall in the lock order. > > Firewall locks are about protecting the rule state of the firewall from > corruption when firewall rules are updated, allowing readers to > interpret the rules using a static snapshot, and writers to avoid > mangling the rules via simultaneous non-atomic update. As such, when > the firewall code is entered, the firewall lock is acquired, and held > until the packet has been completely processed. Things get sticky deep > in the firewall code because our firewalls include credential-aware > rules, which essentially "peek up the stack" in order to decide what > user is associated with a packet before delivery to the connection is > done. The firewall rule lock is held over this lookup and inspection of > TCP-layer state. In the out-bound path, we pass down the TCP state > reference (PCB pointer) and guarantee the lock is already held. However, > in the in-bound direction, the firewall has to do the full lookup and > lock acquisition. Which reverses the lock order, and can lead to > deadlocks. I am doing work on fixing htis for ipfw. it involves moving ipfw to a lockless method of operation. (more info will be in the ipfw list in a few days) > > debug.mpsafenet=0 places the Giant lock in front of all network stack > lock acquisition, which effectively serializes all of the above. It > doesn't remove the lock order reversal, but it does eliminate > simultaneous lock acquisition, removing one of the necessary causes of > deadlock. This trick of a serializing "global" lock in order to prevent > lock order between "leaf" locks is not an uncommon technique, but in > this case has a significant overhead (requiring non-parallelism in > network processing), and needs to be fixed. > > The key is to guarantee that the acquisition of the firewall reference > will never be blocked waiting on a PCB lock -- i.e., that the firewall > "lock" isn't a lock so much as a reference count that will never have to > wait, removing the waiting requirement from the deadlock equation. I > know that Julian Elischer has been looking at doing this, and others may > have also. The model is essentially that you either starve writers to > the firewall data, or you create a read-only snapshot to be used by > readers in the event a writer arrives, allowing readers to pick up the > new rules if available, or the old rules if not, and never wait > indefinitely either way. yep.. I have detailed plans afoot but not for pf. I wouldn't know pf if it came up and kicked me in the shins so I'll be leaving that to someone else. > > Robert N M Watson > Computer Laboratory > University of Cambridge > _______________________________________________ > freebsd-current_at_freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"Received on Wed Apr 18 2007 - 06:03:34 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:08 UTC