Re: GEOM architecture and the (lack of) need for foot-shooting

From: Marcel Moolenaar <marcel_at_xcllnt.net>
Date: Thu, 7 Apr 2005 20:06:20 -0700
On Apr 7, 2005, at 3:57 PM, Poul-Henning Kamp wrote:

>> Questionable. What about the following reasoning:
>>
>> The partition table on a disk is there to help the firmware and OS
>> to identify the kinds of file systems on that disk and their bounds.
>> Once the OS has been loaded and has obtained all the information it
>> cares about, the partition table is not needed anymore. Its existence
>> has become irrelevant. Removal of the partitition table does not in
>> any way invalidate the file systems that are on that disk, nor make
>> them inaccessible to the CURRENTLY RUNNING OS. It is only when the
>> partitions are to be found again across a reboot that the partition
>> table needs to be there and needs to be valid.
>
> I think that is a recipe for disaster and the fact that all operating
> systems which implement it have resorted to all but forcing a reboot
> right after any change seems to validate my point.
>
> Which view do you offer the user if he enters the partitioning tool
> a second time before he reboots ?

That's a good question and pivotal in finding a good solution. Since
we're talking about a partitioning tool, my first reaction is that it
should offer what it works on: the on-disk partition table. However,
in order for it (the tool) to be helpful to the user in most (if not
all) cases, it should probably have knowledge of the in-core view of
the disk so that it can assist the sysadmin. I deliberately don't
want to define what "assistance" means in this context right now. It's
a complex matter that takes discussion and thought.

> What about crash-safety ?

Crash-safety may be a policy issue more than it is a technical issue.
One typically don't want to change the on-disk partitioning in such a
way that a sudden crash renders the machine useless and leave the box
in that state during production. The tool can provide assistance here
as well (e.g. when the in-core view of the disk is nil or lambda, then
the on-disk partitioning can never be incompatible. The tool can allow
anything. If the disk contains the root file system, the tool can do
various things from disallowing certain changes to forcing a reboot).

> How do you deliver a credibly convincing argument that the users
> system is going to boot again ?  At the very least, the 
> diskpartitioning
> tool needs to say "This will hose your system  Abort/Retry/Ignore"

Reboots can be avoided by allowing the in-core view to be synchronized
with the on-disk partitioning. The crux is that it should probably be
done at the sysadmin's discretion and not unconditionally like GEOM is
designed to do right now. Depending on where the root file system lives,
restarts can be implemented as an alternative to reboots if we like the
root file system to be changed as well. The latter may be handy during
system installation. Not sure -- just speculating...

> You basically need to implement a high level of overview and inference
> in the tools to be able to protect the novice users.

I don't think it's that bad, but it's exactly the "assistance" I didn't
define above. We may come to the conclusion that it is in fact a hard
problem.

> That is why I didn't go that way.  Instead I opted for a system where
> we are at all times in a single consistent state, and where we do
> not allow operations which takes us into an inconsistent state.

Understood. I think it's even fair to say that such is the normal state
in which a machine runs in production and it's therefore reasonable to
optimize for that state. However, this thread is an indication that we
cannot optimize the uncommon state away and that we should better handle
the case where disks (or the system at large) is in state of flux.

>> Even if a replacement partition table encodes a completely different
>> layout, does it not have to be a problem. The OS just needs to ignore
>> the partition table.
>
> It does not have to be a problem, but how do you implement code to
> find out _if_ it is a problem so you can warn the user ?

Typical problem cases include partial overlaps between on-disk and
in-core partitions. This is easy to catch and can always result in a
red flag. More difficult cases are missing on-disk partitions of
native file system or size mismatches. Those may or may not cause any
problems. A yellow flag is in order. Green flags for any partition
changes that affect partitions we don't know or care about, as well
as the addition of partitions (native or otherwise).

Roughly speaking...

> In one of my GEOM prototypes I had a protocol where a provider could
> ask the consumer which bits it really cared about.  That way you
> could tell a mountpoint to shrink what bit of the disk it used
> and afterwards reclaim that bit of the partition.
>
> In the end I decided that this was waaaay too much code to
> justify the functionality.

I tend to agree even though I have no experience with it myself. It's
all gut feeling for me.

> If you want to go the way you describe, you will need to do something
> like that because otherwise you have no way of making sure that
> you are really having two *functionally* identical views.

I think that having a single view is probably what's biting. If you
let go of that, you can change the on-disk view in any way you see fit
without having to worry about the in-core view. It's up to the sysadmin
to force the in-core view to be synchronized to the on-disk view and
the tool he/she uses for that should be able to help out there. A reboot
is just one of the alternatives.

>> Thus:
>> Is it actually the right behaviour to invalidate the OS's notion of
>> disk partitions whenever the on-disk tables are changed or removed
>> and if so does that hold in all cases?
>
> We don't invalidate "whenever the on-disk tables are changed", we
> veto the change if it would jeopardize the currently opened providers
> under the presumption that all our current users (filesystems)
> explode if you look at them wrong from that angle.

The veto is just the protection mechanism. Without it, any change to
the on-disk partition table would immediately affect the in-core view
and as such invalidate or override the existing view.

>>> The correct way to do that is to use the g_ctl() api because what
>>> is needed is an out-of-band mechanism to tell that we want to loose
>>> one of the partitions.
>>
>> Such mechanism would be needed only to inform the OS that it should
>> forget about partitions it currently knows about (whether mounted or
>> not).
>
> I think you presuppose a much higher level of ability than the majority
> of our superusers lay claim to.  You scheme would require a much
> stronger set of userland utilities to avoid unintentional footshooting.

Possibly. A solution that exactly addresses the needs may not exist. But
if a solution exists that handles a superset of the needs (by virtue of
its inherent complexity), then it's not a good idea to reject it based
on the fact that it solves more than it needs to.

> Considering the sorry state of our current tools, not to mention
> libdisk/sysinstall, I think it switching to such a scheme is a 
> non-starter,
> the amount of code to write is simply prohibitive.

Maybe it should be done one step at a time. Once the right or perfect 
solution
is known it can still be decided that a full implementation of it is not
feasible and that an inferior implementation is better given 
circumstances.
Better yet, with a solution all drafted and providing a clear and 
complete
picture, it's much more likely that someone implements the solution in 
a short
amount of time. There's certainty that the implementation will be 
accepted and
one doesn't have to be a domain expert to do the implementation.

>> answered and so far the problems have remained unsolved. What I fail 
>> to
>> see is the proverbial "let's take a step back and look at it again 
>> from a
>> distance" attitude from you. Instead everybody else's got it wrong or 
>> is
>> missing bits and pieces from the puzzle. Fine, that's certainly 
>> possible,
>> but you're not making a good case for it and I remain unconvinced
>> (FWIW).
>
> I don't think anybody else have spent as long time as I have on
> this subject and if you want to spend the next ten years of your
> life studying it and at some point come with an implementation which
> works better you are more than welcome, heck, I wished you had done
> that 10 years ago!

If it should take me 10 years, I wouldn't be the right person for the 
job.
It's safe to say that I won't be spending 10 years on it -- one way or 
the
other.

>> So, maybe it's time to step back and take a look at it again. Define 
>> the
>> problems that have been raised, describe the cause (real or 
>> artificial)
>> and identify possible solutions, not just yours, and build consensus 
>> for the
>> best solution. Chances are that you actually get other people to help
>> out implementing the solution.
>
> That would be great.
>
> The first requirement for that to be a success is that people stop
> trying to find a quick fix so they don't have to think about the
> problems.

Agreed.

> I would suggest that you go of and do a prototype of your scheme,
> all the GEOM work has given you some very nice interfaces to the
> relevant pieces of the system, all you have to do is offer geom_dev,
> geom_disk and geom_vfs outwards APIs and you're sailing.

Wrong suggestion. Chasing people away to work on something that may or
may not please you in the end will only chase people away. You should
know that by now. The people most likely to help out are the ones you
get pissed off at for expressing an opinion, I'd think...

-- 
  Marcel Moolenaar         USPA: A-39004          marcel_at_xcllnt.net
Received on Fri Apr 08 2005 - 01:06:22 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:31 UTC