Re: Git/Mtn for FreeBSD, PGP WoT Sigs, Merkel Hash Tree Based

From: Igor Mozolevsky <igor_at_hybrid-lab.co.uk> Date: Mon, 7 Oct 2019 11:58:03 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:22 UTC

On Mon, 7 Oct 2019 at 08:43, grarpamp  wrote:
>
> On 10/4/19, Igor Mozolevsky wrote:
> > On Fri, 20 Sep 2019 at 22:01, grarpamp  wrote:
> >>
> >> For consideration...
> >> https://lists.freebsd.org/pipermail/freebsd-security/2019-September/010099.html
> >>
> >> SVN really may not offer much in the way of native
> >> internal self authenticating repo to cryptographic levels
> >> of security against bitrot, transit corruption and repo ops,
> >> external physical editing, have much signing options, etc.
> >> Similar to blockchain and ZFS hash merkle-ization,
> >> signing the repo init and later points tags commits,
> >> along with full verification toolset, is useful function.
> >
> >
> > <snip>
> >
> > Isn't UNIX(TM) philosophy that a program should do one thing and do it
> > well? Just because people can't be bothered to learn to use multiple
> > tools to do *multiple* tasks on the same dataset, is not a reason, let
> > alone "the reason," to increase any program complexity to orders of
> > N^M^K^L so that one "foo checkout" does all the things one wants!
>
> Was r353001 cryptosigned so people can verify it with
> a second standalone multiple tool called "PGP", after the
> first standalone multiple tool called "repo checkout"?
> Was it crypto chained back into a crypto history so they could
> treat it as a secure diff (the function of a third standalone multiple
> tool "diff a b") instead of as entirely separate (and space wasting
> set of) unlinked independant assertions / issuances as to a state?
> How much time does that take over time each time vs
> perhaps loading signed set of keys into repo client config.

I'm guessing they are rhetorical questions; but you ought to look up
how to do tool chaining in any flavour in UNIX(TM).

> Is LOGO and tape better because less complex tool than C and disk.

For some people, perhaps.

<snip>

> > crypto IS NOT a substitute for good data keeping
> > practices.
>
> Who said that it was. However it can be a wrapper of
> proof / certification / detection / assurance / integrity / test
> over them... a good thing to have there, as opposed to nothing.

What is the specific risk model you're mitigating---all you say is
hugely speculative?!

> > Also, what empirical data do you have for repo bitrot/transit
> > corruption that is NOT caught by underlying media?
>
> Why are people even bothering to sha-2 or sign iso's, or
> reproducible builds? There is some integrity function there.
> Else just quit doing those too then.

Funny you should say that, Microsoft, for example, don't checksum
their ISOs for the OSes. You missed the point about reproducible
builds entirely: given code A from Alice and package B from Bob,
Charlie can compile package C from A and verify that C is identical to
B, a simple `diff' of binaries is sufficient for that! The problem is
that a lot of the time code A itself is buggy to such degree that it's
vulnerable to attack (recall Heartbleed, for example). Crappy code is
not mitigated by any layer of additional integrity checking of the
same crappy code!

> Many sources people can find, just search...
> https://www.zdnet.com/article/dram-error-rates-nightmare-on-dimm-street/
> http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf
> http://www.cs.toronto.edu/~bianca/papers/ASPLOS2012.pdf
> https://www.jedec.org/sites/default/files/Barbara_A_summary.pdf
> https://en.wikipedia.org/wiki/Data_degradation
> https://en.wikipedia.org/wiki/ECC_memory
> https://en.wikipedia.org/wiki/Soft_error

I don't bother with second-hand rumors on WikiPedia so I'm not even
going to bother looking there, but as for the rest, seriously, you're
quoting a study of DDR1 and DDR2??? I have it on good authority that
when at least one manufactured moved to smaller die process for
DDR3 they saw the error rates plummet to their own surprise (as they
were expecting the opposite) and now we're on DDR4, and what's
the die size there?.. Perhaps you need to look into the error rates of
EDO RAM et al too?

In any event, ECC, integrity checking etc is done on the underlying
media to detect and in some cases correct errors so you have to worry
less about it at higher levels, so getting so obsessed by it is just
silly especially advocating for a tool to do it all in one go! Here's
a question to ponder: if code set X, certificate Y, and signed digest
Z are stored on one media (remote server in your case), and your
computed digest doesn't match digest Z, what part was corrupt, X, Y,
or Z, or your checksumming?

> Already have RowHammer too, who is researching DiskHammer?

And RowHammer has been successfully demonstrated in a production
environment? How exactly are you planning on timing the attack vector
to get RAM cell data when you (a) don't know when that cell will be
occupied by what you want, nor (b) where that cell is going to be in the
first place? Go ask any scientist who works for pharma to explain the
difference between "works in a lab" and "works in the real world"...

> Yes, there does need to be current baseline studies made
> in 2020 across all of say Google, Amazon, Facebook global
> datacenters... fiber, storage, ram, etc. It is surely not zero
> errors otherwise passed.

Perhaps you need to "tell" Google, Amazon, Facebook, et al about that,
and then come back to us with the results of those studies?

To sum up, you're advocating for extra effort with no empirical data
nor a decent risk model to justify the effort, good luck!

--
Igor M.