Uploaded image for project: 'FishEye'
  1. FishEye
  2. FE-3071

HG: Fix slurping issues for hgconvert/hgsubversion

    • Icon: Bug Bug
    • Resolution: Won't Fix
    • Icon: Low Low
    • None
    • 2.3.0
    • Indexing

      hgconvert and hgsubversion produce repos that hg does not when used natively, with file revs included in the changeset when they were just merged.

      Document this more and come up with some way to work more optimally for ex-svn repos.

            [FE-3071] HG: Fix slurping issues for hgconvert/hgsubversion

            mwatson added a comment -

            See the last comment:

            The proposed fix here was really a hack to reduce indexing times that can sometimes misrepresent the data we have. A better way to speed indexing times is as previously mentioned, to only pull certain branches into a "wroking" repository and index that.

            mwatson added a comment - See the last comment: The proposed fix here was really a hack to reduce indexing times that can sometimes misrepresent the data we have. A better way to speed indexing times is as previously mentioned, to only pull certain branches into a "wroking" repository and index that.

            mwatson added a comment -

            Hi Tim,

            I see the support engineers are looking into your indexing issues. We had similar issues ourselves, so hopefully they can come up with an answer for you.

            Great to know your developers loved FishEye and Crucible!

            The main issue this JIRA is addressing is when hgconvert/hgsubversion creates file revisions in the converted repository, where the underlying file revision in a merge commit is actually the same as one created on a branch - we can try to detect this and instead of processing a load of diffs as though it is a new version of the file on the merged to branch, actually just use the file revision created on the merged from branch as the parent revision for subsequent changes. Note this "merge" happens in SVN and there is not a merge commit in the converted hg repo, but the underlying file revisions in hg contain enough info to do this.

            It would mean a speedup improvement in indexing, but at the cost of not accurately representing what had happened in the underlying hg repo. It also may not apply to you at all (we haven't done testing against repos created using cvs2hg).

            Our testing has indicated that the speedup gained is not significant enough for us to implement this feature. We realised much better indexing speed (and which may work better for repos converted from cvs using cvs2hg) by having a "full" repository, which was the result of a complete conversion, but then cloning only certain active branches from it to a "light" or "working" clone that people develop on and we get FishEye to index this - this excluded a lot of closed heads that hgsubversion created to represent complex tags in subversion (which in turn were mostly produced by cvs2svn years ago) which had HUGE diffs and took a long time to index. The beauty of this approach is that the tags are still available if we want to migrate them to the light repo (by hg pull full-repo -r TAG; hg push light-repo) and FishEye can index them as you need them, rather than spending ages on them all at once.

            This is not optimal, but we are limited somewhat by the speed of mercurial producing these huge diffs (just doing a hg diff between the tag and some other point (like it's parent commit takes a long time) and that we need to get these diffs per-file rather than for a whole commit at once. We are working on other performance improvements (http://jira.atlassian.com/browse/CRUC-3883) that should speed up indexing in other ways.

            Hope this helps,
            Matt

            mwatson added a comment - Hi Tim, I see the support engineers are looking into your indexing issues. We had similar issues ourselves, so hopefully they can come up with an answer for you. Great to know your developers loved FishEye and Crucible! The main issue this JIRA is addressing is when hgconvert/hgsubversion creates file revisions in the converted repository, where the underlying file revision in a merge commit is actually the same as one created on a branch - we can try to detect this and instead of processing a load of diffs as though it is a new version of the file on the merged to branch, actually just use the file revision created on the merged from branch as the parent revision for subsequent changes. Note this "merge" happens in SVN and there is not a merge commit in the converted hg repo, but the underlying file revisions in hg contain enough info to do this. It would mean a speedup improvement in indexing, but at the cost of not accurately representing what had happened in the underlying hg repo. It also may not apply to you at all (we haven't done testing against repos created using cvs2hg). Our testing has indicated that the speedup gained is not significant enough for us to implement this feature. We realised much better indexing speed (and which may work better for repos converted from cvs using cvs2hg) by having a "full" repository, which was the result of a complete conversion, but then cloning only certain active branches from it to a "light" or "working" clone that people develop on and we get FishEye to index this - this excluded a lot of closed heads that hgsubversion created to represent complex tags in subversion (which in turn were mostly produced by cvs2svn years ago) which had HUGE diffs and took a long time to index. The beauty of this approach is that the tags are still available if we want to migrate them to the light repo (by hg pull full-repo -r TAG; hg push light-repo ) and FishEye can index them as you need them, rather than spending ages on them all at once. This is not optimal, but we are limited somewhat by the speed of mercurial producing these huge diffs (just doing a hg diff between the tag and some other point (like it's parent commit takes a long time) and that we need to get these diffs per-file rather than for a whole commit at once. We are working on other performance improvements ( http://jira.atlassian.com/browse/CRUC-3883 ) that should speed up indexing in other ways. Hope this helps, Matt

            Hi,

            This is very important to us. We are testing a migration from CVS to Mercurial for all of our repos. The primary repo is 10-years-old. We have pruned it down to 4.7GB. We have not been able to get Fisheye to index the converted Mercurial repo (it's been running for weeks on a test migration, and we estimate it'll take one year at this rate). We have already raised this support ticket: https://support.atlassian.com/browse/FSH-4395

            BTW, we demoed Fisheye to a bunch of developers on one of our smaller repos. They want it. We want it. It's a lovefest (along with Crucible). When you deliver, we'll buy.

            Tim Murphy added a comment - Hi, This is very important to us. We are testing a migration from CVS to Mercurial for all of our repos. The primary repo is 10-years-old. We have pruned it down to 4.7GB. We have not been able to get Fisheye to index the converted Mercurial repo (it's been running for weeks on a test migration, and we estimate it'll take one year at this rate). We have already raised this support ticket: https://support.atlassian.com/browse/FSH-4395 BTW, we demoed Fisheye to a bunch of developers on one of our smaller repos. They want it. We want it. It's a lovefest (along with Crucible). When you deliver, we'll buy.

            mwatson added a comment -

            The dricver for this was the extremely long amount of time that certain commits representing svn merges took to slurp. This was first seen in hgsubversion produced repos and may not occur in hgconvert repos. Investigate whether this is a problem and look at ways of dealing with it if so.

            mwatson added a comment - The dricver for this was the extremely long amount of time that certain commits representing svn merges took to slurp. This was first seen in hgsubversion produced repos and may not occur in hgconvert repos. Investigate whether this is a problem and look at ways of dealing with it if so.

              Unassigned Unassigned
              Anonymous Anonymous
              Affected customers:
              1 This affects my team
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved:

                  Estimated:
                  Original Estimate - 4h
                  4h
                  Remaining:
                  Remaining Estimate - 4h
                  4h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified