Re: FYI: SVN to GIT converter currently broken, github is falling behind

From: Ulrich Spörlein <uqs_at_FreeBSD.org>
Date: Sun, 8 Nov 2015 12:06:36 +0100
2015-11-08 11:32 GMT+01:00 Ulrich Spörlein <uqs_at_freebsd.org>:
> 2015-11-08 2:51 GMT+01:00 Alfred Perlstein <alfred_at_freebsd.org>:
>>>
>> Uli,
>>
>> One of the biggest concerns I've heard from folks using FreeBSD's git mirror
>> is that the hashes can change.
>>
>> I have a question about this.   Is it possible to keep track of what the
>> "official" git mirror (on github) is doing and keep that as a log.  Then
>> that log can be used to replay commits when there is a divergence problem.
>>
>> What I'm basically saying is that let's take this small example:
>>
>> importer is working fine _at_rev 10000
>> imports 10000
>> imports 10001
>> imports 10002
>> something happens to importer to give indeterminate shas.
>> imports 10003 - sha is "unstable" sha3
>> imports 10004 - sha is "unstable" sha4
>> imports 10005 - sha is "unstable" sha5
>> imports 10006 - sha is "unstable" sha6
>> importer is fixed
>>
>>
>> At this point normally we'd rewind the importer to 10002 and then force
>> update the affected branches.
>>
>> My question is... can the imports of 10003, 10004, 10005 and 10006 be put
>> into the importer such that any "mirror site" that re-does the import using
>> the most up to date importer will get the same shas.
>>
>> That would allow to proceed with 10007, etc without force pushing.
>>
>> This should be possible based on querying "git" for the meta data associated
>> with sha3..sha6 and then forcing those commits to have the same meta data.
>>
>> This would eliminate the concern about shas in the mirror changing that I've
>> heard.
>
> The goal of the conversion is that everyone can re-do the conversion
> in their basement and come up with the same history and checksums.
> This was not the case when I first started, as there was some
> non-deterministic hash structure being used in svn2git. This was fixed
> in the code and then all converter runs produced the very same
> results.
>
> The scenario that we have right now, is that one of the merge commits
> done about two weeks ago is being handled different by svn2git w/ svn
> v1.8 vs. svn v1.9 and I haven't investigated yet how the API's
> behavior changed to cause this. I'm afraid I also swapped out all my
> knowledge about svn2git internals and will have to redo this all from
> scratch :/
>
> Your suggestion could only work, if we hard-code this svn revision
> special handling into svn2git, either in the code or by providing more
> mappings and rules to the process. svn2git should run hermetic and not
> poke at github's commits to see how things were handled in the past.
> It has to be self-sufficient and must not depend on github.
>
> This would also only work, if the "breakage" window was very small,
> but it is already about two weeks long and will surely increase till I
> find the proper fix.
>
> So, to take a stand here: this sort of kludge is unlikely to ever
> happen. Git commit hashes *might* change in the future. I really don't
> see how this is a big deal anyway.  It happened once and I'm trying to
> have it never happen again. But why are people afraid of this
> happening? Every "official" git commit is tagged with a SVN revision
> and the contents of those revisions are obviously correct (just not
> the ancestry and the commit objects, possibly). So it would be easy to
> write a script that replays VendorA's git history and swaps out the
> new official commits for the old official commits. There would be no
> merge conflicts.
>
> I can see how this would be annoying if you have 100 developers and
> dozens of branches that are far from mainline FreeBSD. But I'm sure
> these companies that depend on git will come forward and donate some
> of their developer manpower to help me with keeping the converter
> stable/deterministic. Right? Right? :) :)
>
> Cheers,
> Uli

Quick update: doc is so far unaffected by svn 1.9, but for ports, the
drift happened as of Jul 18, so you'd need to special case a lot of
commits.

Here's the same commit, and the difference between 1.8 and 1.9:

% git cat-file commit 803795d
tree 7fc83aba022834da5c218114b09ad4640735bcc0
parent c96fb0418e545a569b5975b4d878a30a948c29d5
author olgeni <olgeni_at_FreeBSD.org> 1437203525 +0000
committer olgeni <olgeni_at_FreeBSD.org> 1437203525 +0000

Upgrade to version 0.4.1.
% git cat-file commit 61ca43b
tree 7fc83aba022834da5c218114b09ad4640735bcc0
parent c96fb0418e545a569b5975b4d878a30a948c29d5
author olgeni <olgeni_at_FreeBSD.org> 1437203529 +0000
committer olgeni <olgeni_at_FreeBSD.org> 1437203529 +0000

Upgrade to version 0.4.1.


In case you don't see it, there's a 4s difference in the timestamps
for authoring and committing. Here's the original:

% svn log -vc392405 svn://svn.freebsd.org/ports
------------------------------------------------------------------------
r392405 | olgeni | 2015-07-18 09:12:05 +0200 (Sat, 18 Jul 2015) | 2 lines
Changed paths:
   M /head/www/elixir-maru/Makefile
   M /head/www/elixir-maru/distinfo

Upgrade to version 0.4.1.

------------------------------------------------------------------------

So yeah, svn 1.9 returned a timestamp that was off by 4s. WTF?

For base it's actually even more complicated than I had thought so
far. But let's take this one step at time ...
Received on Sun Nov 08 2015 - 10:06:40 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:00 UTC