Uploaded image for project: 'Crucible'
  1. Crucible
  2. CRUC-5821

ReviewInfoDao.doIncremental() is inefficient and causes Repo Review Indexing to hang.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • High
    • 2.6.2, 2.7.0
    • 2.5.3
    • None

    Description

      Hi,

      i'm not sure of this is fixed already - but one of my colleague Chris has analyzed this problem.

      Can you please look into and perhaps comment on these findings and the actions we took. it is it ok to delete the way we did? what options do we have - or is this a code fix on your side? have you already fixed this in a version > 2.5.3?

      thx

      Chris:

      Crucible has a Hibernate audit interceptor which inserts rows into the cru_feindex_msg table when a review gets touched, to ask Fisheye to reindex files relating to the modified code review asynchronously. The indexer thread deletes rows from this table immediately after indexing. The table is effectively a messaging queue containing temporary data.

      The code which consumes from this table is in ReviewInfoDao.doIncremental(). This loads all the rows from the cru_feindex_msg table then iterates over them in a transaction. Every time a Hibernate flush occurs during processing, it does a dirty check of all the objects in the session. If you double the number of items in the cru_feindex_msg table, that means you're processing twice as many items and for each item the Hibernate session is doing twice as many dirty checks: the result is O(N^2) performance.

      Unfortunately treasury-svn currently has 268,000 items pending indexing in the cru_feindex_msg table, and the indexer has made no progress on this at all today as far as I can tell. The table includes over 200K rows relating to 3 astoundingly large commits. I think these happened when someone accidentally deleted the SVN trunk and then restored it.

      select r.cru_revision_display_name, count
      from cru_feindex_msg m, cru_revision r, cru_stored_path p
      where m.cru_fr_id = r.cru_revision_id
      and r.cru_path = p.cru_path_id
      and r.cru_source_name = 'treasury-svn'
      group by r.cru_revision_display_name
      order by 2 desc

      cru_revision_display_name count
      ---------------------------- --------
      55022 71147
      55316 69664
      55315 69664
      63040 7330
      63354 2108
      63346 2108
      ...

      Do you or Atlassian have any suggestions how we can fix this? Would re-indexing the repository from scratch help? In the short term I suspect deleting these 3 problematic commit revisions (200K rows) from the cru_feindex_msg working table is likely to help the indexer recover, but I'd want to check first that there are unlikely to be side effects.

      Thanks,
      Chris

      PS. It's possible that this has happened recently because a developer did something to touch an old code review containing these change sets - maybe they closed it? We did start asking developers about a week ago to be good and close their old reviews. Circumstantial evidence at best.

      • svn commit 55022 was someone copying a historical revision of the trunk back to the head of the trunk after accidentally deleting it.
      • 55315 was someone copying the contents of /trunk into /trunk/trunk
      • 55316 was that developer deleting the accidental copy

      For the first time that I've seen in weeks or months, treasury-svn no longer says it is busy indexing our reviews. To let the indexer catch up I pruned some rows from cru_feindex_msg. This contains the list of reviews and file revisions which have been requested to be reindexed asynchronously. Items get inserted into this table every time the review changes.

      Delete duplicated requests to reindex reviews from cru_feindex_msg:

      select m1.cru_id
      from cru_feindex_msg m1
      left join cru_review r on m1.cru_review_id = r.cru_review_id
      join cru_feindex_msg m2 on m1.cru_review_id = m2.cru_review_id and m1.cru_fr_id = m2.cru_fr_id
      where m1.cru_id < m2.cru_id
      and r.cru_project = 13 (13 is treasury)

      Delete requests to reindex reviews which are already closed or dead:

      select m.cru_id
      from cru_feindex_msg m
      left join cru_review r on r.cru_review_id = m.cru_review_id
      where r.cru_state in ( 'Closed', 'Dead')
      and r.cru_project = 13

      Then you can check which projects have a big backlog of files remaining to be reindexed:

      select r.cru_source_name, count
      from cru_feindex_msg m, cru_revision r
      where m.cru_fr_id = r.cru_revision_id
      group by r.cru_source_name
      order by 2 desc

      cru_source_name count
      ------------------ --------
      middle-office-svn 20269
      HydraSVN 6398

      Treasury previously had 260,000. The above deletions reduced this to about 20,000, and that backlog got cleared in a few hours.

      The middle-office-svn count doesn't seem to have decreased much since I've been looking at it. Probably some big items to reindex in there?

      It may have been possible to achieve the same just by asking Fisheye to reindex the project from scratch - this deletes the contents of cru_feindex_msg which is only used for incremental indexing - but I wasn't brave enough to trust that this wouldn't make Crucible unavailable to us for several days.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ba6439a9b329 Trevor Samaroo
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: