Uploaded image for project: 'FishEye'
  1. FishEye
  2. FE-314

Fisheye copes poorly with branch or tag deletions

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Medium Medium
    • None
    • 1.4.2, 1.4.3
    • None

      From previous correspondance with Conor MacNeill:

      >
      > Does not cope well with branch/tag deletions
      >
      > One of the operations which regularly causing blocking of repository
      > scanning is the removal of branches or tags within a Subversion
      > repository.
      > I don't know why this is the case - I opened a query with Cenqua about
      > this - they weren't actually able to tell me the answer as to why it takes
      > so long.

      From an external view, a Subversion respository can appear huge. i.e.
      if you performed an svn listing (ls) on the root of a Subversion
      repository, which contains 10 branches and 100 tags, the repository will
      appear to be about 111 times its actual size. This is not a problem for
      Subversion itself, internally, as its access is predominantly sequential
      (with respect to the directory hierarchy). As it navigates a path, it
      can follow copy links - so called cheap copies. As an indexer, FishEye's
      access to the directory hierarchy is predominantly "random access".
      Searching with FishEye would be more difficult if it were navigating the
      directory hierarchy sequentially.

      We are actively researching whether we can take advantage of "cheap
      copy" style linking within FishEye's index to better handle the way
      Subversion works. In other word, we do recognize the problem and we are
      working on solutions.

      One of the approaches taken within FishEye to mitigate the problem is to
      recognize tags as such and to tag the original file's entry in the
      FishEye index. Effectively the tagged directory hierarchy is virtualized
      by tagging the entries for the source of the tag. This keeps the size of
      the FishEye index down significantly but does mean that FishEye needs to
      know when a copy represents a tagging operation.

      Workarounds:

      1. Add exclude rules in order to make Fisheye ignore the paths from the branches and tags that have been deleted.
      2. Set a start revision to start indexing from a particular revision.
      3. Split a single large repository into logical components (e.g., by project or by product).
      4. Change the repository definition so as to look at a subset of your repository by setting a path within your repository that you wish Fisheye to index.
      5. After trying one or more of the suggestions above, saving changes and exiting the repository administration panel a modal window will be presented saying that the repository needs to re-indexed for changes to take effect, and offering two options:
        • Perform now will immediately erase the repository cache indexed thus far and will start re-indexing the repository from scratch.
        • Ignore will keep everything indexed thus far, but you need to stop and start the affected repository to make Fisheye ignore the excluded paths from now onwards.
      6. As a general advice, you could benefit from a way faster indexing speed by using file protocol instead of HTTP / HTTPS for connecting to the repository. According to the Best Practices document, HTTP and HTTPS are the slowest protocols. You can switch to file protocol by either:
        • Migrating Fisheye to the server where Subversion is installed, or
        • Using the svnsync utility to mirror the remote repository onto the server where Fisheye is installed.
      7. The most important aspect for a large-repository deployment will be disk I/O speed. You definitely want a fast local HDD for Fisheye's cache. Note that NFS and SAN are not supported. Perform the disk access speed test and compare your benchmarks with the table shown in the section Grading the Results from that document in order to make sure that the HDD speed on the Fisheye server is at least OK for all file operations.

            [FE-314] Fisheye copes poorly with branch or tag deletions

            A user accidentally did an SVN copy of "/" to "/branches/NewBranch". A minute later, realizing the mistake, they did an SVN delete of "NewBranch".

            This doesn't seem to slow down subversion any, but now Fisheye indexing hangs on the delete commit for hours ...

            Robert Leland added a comment - A user accidentally did an SVN copy of "/" to "/branches/NewBranch". A minute later, realizing the mistake, they did an SVN delete of "NewBranch". This doesn't seem to slow down subversion any, but now Fisheye indexing hangs on the delete commit for hours ...

            This is currently a blocker for us.

            We have a SVN repository which we are unable to index because a few commits were made in it where over 7000+ tags were deleted in a single revision.
            Seems like it's eating more and more memory when fetching this repository, up to the point where we get OutOfMemory errors... Our latest attempts have been to increase the JVM's memory to 9 GB and the SVN timeouts to 11 days but still no success.

            Or if only we were able to tell the indexer to skip those revision, that would be an acceptable workaround too .

            Jérémie Faucher-Goulet added a comment - - edited This is currently a blocker for us. We have a SVN repository which we are unable to index because a few commits were made in it where over 7000+ tags were deleted in a single revision. Seems like it's eating more and more memory when fetching this repository, up to the point where we get OutOfMemory errors... Our latest attempts have been to increase the JVM's memory to 9 GB and the SVN timeouts to 11 days but still no success. Or if only we were able to tell the indexer to skip those revision, that would be an acceptable workaround too .

            See also FE-3949

            Dmitry Tsitelov added a comment - See also FE-3949

            Hi,

            One of our main goal on our roadmap is to improve the overall performance in FishEye. We have done so in the past releases and one of the major investment was the Pipelined Indexing introduced in FishEye 3.0.

            We will continue this effort and we want to have a better support for the deletion of tags and branches. We will not be able to look in this particular issue before the end of the year due to existing priorities but, as mentioned above, performance improvements is one of our top theme at the moment and we should be able to provide a better feedback on this at the end of the year.

            Regards,

            Sten Pittet
            FishEye / Crucible Product Manager

            Sten Pittet (Inactive) added a comment - Hi, One of our main goal on our roadmap is to improve the overall performance in FishEye. We have done so in the past releases and one of the major investment was the Pipelined Indexing introduced in FishEye 3.0. We will continue this effort and we want to have a better support for the deletion of tags and branches. We will not be able to look in this particular issue before the end of the year due to existing priorities but, as mentioned above, performance improvements is one of our top theme at the moment and we should be able to provide a better feedback on this at the end of the year. Regards, Sten Pittet FishEye / Crucible Product Manager

            Any update on this?
            We're evaluating Fisheye and the initial scan is taking an eternity.

            5 days and only 15% has been indexed in one of our repository.
            This practically prohibit us from doing any full index outside the initial indexing without rendering Fisheye unusable for a month!

            This is really a show stopper.

            Philippe Busque added a comment - Any update on this? We're evaluating Fisheye and the initial scan is taking an eternity. 5 days and only 15% has been indexed in one of our repository. This practically prohibit us from doing any full index outside the initial indexing without rendering Fisheye unusable for a month! This is really a show stopper.

            Kyle Lebel added a comment -

            Any updates or known workarounds? The slow re-indexing is crippling our ability to keep up with the upgrades...

            Kyle Lebel added a comment - Any updates or known workarounds? The slow re-indexing is crippling our ability to keep up with the upgrades...

            Chris Walquist added a comment - - edited

            Wow, 1 month; I thought I was suffering with 48-hour reindexes!

            This is a show-stopper. Our dev team requires a code search tool that is fast and up-to-date. They've mostly stopped using FishEye, and mostly due to this one bug. Any random tag or branch delete takes FishEye out of commission for indeterminate amounts of time, depending on the size of the tag-or-branch.

            Why should anyone rely on a tool that suffers frequent random outages of uncontrollable duration, with no valid recovery scenario? I hope Atlassian product development is asking themselves that question.

            Chris Walquist added a comment - - edited Wow, 1 month; I thought I was suffering with 48-hour reindexes! This is a show-stopper. Our dev team requires a code search tool that is fast and up-to-date. They've mostly stopped using FishEye, and mostly due to this one bug. Any random tag or branch delete takes FishEye out of commission for indeterminate amounts of time, depending on the size of the tag-or-branch. Why should anyone rely on a tool that suffers frequent random outages of uncontrollable duration, with no valid recovery scenario? I hope Atlassian product development is asking themselves that question.

            Matt Bruce added a comment -

            This issue really kills us. It takes almost a month to trace the entire repository as a result. If we have a cache corruption (JVM crash, machine restart, etc - anything without a clean exit), and have to retrace, we can't use the product for a month. Can this be fixed?

            Matt Bruce added a comment - This issue really kills us. It takes almost a month to trace the entire repository as a result. If we have a cache corruption (JVM crash, machine restart, etc - anything without a clean exit), and have to retrace, we can't use the product for a month. Can this be fixed?

            I like to see FishEye improved in this area too, the slow processing of tag deletions results in delays in scanning other repositories.

            Ian Cusden added a comment - I like to see FishEye improved in this area too, the slow processing of tag deletions results in delays in scanning other repositories.

            Raising this as a specific Jira ID so that the defect can be tracked - please mark as a duplicate if already being worked upon.

            David Grierson added a comment - Raising this as a specific Jira ID so that the defect can be tracked - please mark as a duplicate if already being worked upon.

              Unassigned Unassigned
              772198ccd59e David Grierson
              Affected customers:
              48 This affects my team
              Watchers:
              30 Start watching this issue

                Created:
                Updated: