On Fri, Jun 3, 2011 at 11:26 AM, Maxim Sobolev <sobomax_at_freebsd.org> wrote: > > I would also like to get your input on my two other patches - randomization > of the synchronization pattern and ad-hoc asynchronous more. Hastd appears > extremely useful to synchronize large virtual disks over slow links without > taking live virtual machine offline. For me the idea to send updates to secondary only via synchronization thread, starting it periodically looks interesting. Sure it should not be the replacement for "real" async mode, but having something like this in hast apart other synchronization modes might be useful. Comparing it with "real" async that is described in manual it has the following advantages: 1) It is much easier to implement. 2) If you have frequent updates of the same blocks, "real" async will send them all, while with sync thread approach we will skip many intermediate updates. Even if we don't run sync thread very frequently and HAST switches to failover it may sync dirty buffers from previous master. It might be useful for backuping volumes via WAN, instead of rsync or zfs send. There is a disadvantage -- instead of sending only one dirty block we synchronize the hole extent (see below how it may be improved though). But let me say about the problems with your patch: http://sobomax.sippysoft.com/primary.c.diff In your approach you still put the requests to the send thread but mark them there as failed so they are not actually sent and the extent is marked as need sync. You don't start sync thread. It starts in your case after reconnecting to secondary. You have frequent reconnects because of the following. Because there are requests in the send thread it does not send keep alive requests (it sends them only when it is idle) but actually the requests are not sent and the secondary exits by timeout not receiving any data from primary. Sure frequent reconnects are bad. Also the problem you described in "randomization" thread looks like is only possible with your patch. As the request "fails" in send thread the extent is marked as need sync, if at this time sync thread is running you may observe the effect when the same frequently updated extent is resent frequently. Without your patch an extent may be marked as need sync only when connection to secondary is lost, so synchronization is not running at that moment. I think the right approach could be: 1) Don't put the request to the send thread at all. 2) When returning the request to the kernel it still remains dirty in memmap. 3) periodically, the dirty (in memmap) extents are marked as need sync and the sync thread is waken up. Here is the patch that implements it: http://people.freebsd.org/~trociny/hast.async.patch The patch can not be considered as complete because of: 1) I think this mode should not be called async, because people would expect from it the behavior that was known from man (and how it works in DRBD it suppose). Also "real" async might be implemented in future too. Some other name should be thought out. 2) The synchronization thread is waked up in guard thread every HAST_KEEPALIVE seconds. I think it should be not so frequent and configurable. It can be improved but I would like to know Pawel's opinion first. He might know why this is completely wrong :-) Now about sending the hole extent when only small part of it is updated. It might be improved with checksum based synchronization. I have a patch that implements it -- when synchronizing an extent, before sending the chunk of MAXPHYS size, its checksum is send and if it matches the chunk is not sent. It is supposed to be useful when one needs to resync disks, e.g. after split brain, when most of the blocks on the nodes match. But apparently it should improve things in this case too. -- Mikolaj GolubReceived on Sat Jun 25 2011 - 12:54:14 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:15 UTC