Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-35040

Page move action causes database deadlock and data corruption

      NOTE: This bug report is for Confluence Server. Using Confluence Cloud? See the corresponding bug report.

      Related
      https://jira.atlassian.com/browse/CONF-24883 (similar issue)
      https://jira.atlassian.com/browse/CONF-34870 (slow page move increases window of deadlock)

      Suggested fix(es)

      • Have single (cluster wide) queue for page move operation
      • Hold (cluster) composite lock (parentID:childId), or do two locks in natural ordering (parentId -< childId).

      Side effect of this bug

      • Page Tree cycling eternally and causes OOM in under few minutes.
      • breadcrumbs hits stackoverflow
      • other plugins (indexing?) are not guaranteed to handle cyclic graphs in confluence.

      Symptoms:
      nginx:

      172.24.12.144 - - [23/Sep/2014:01:40:43 +0000] "GET /pages/movepage.action?pageId=2302876189&position=append&targetId=2298943959&atl_token=1a8fab263d4644b9373a3453c9bceea99fdd9166&_=1411436432399 HTTP/1.0" 200 271 "https://extranet.atlassian.com/pages/reorderpages.action?key=ST" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0"
      172.24.12.144 - - [23/Sep/2014:01:40:44 +0000] "GET /pages/movepage.action?pageId=2302876189&position=above&targetId=2298943959&atl_token=1a8fab263d4644b9373a3453c9bceea99fdd9166&_=1411436436617 HTTP/1.0" 200 271 "https://extranet.atlassian.com/pages/reorderpages.action?key=ST" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0"
      172.24.12.144 - - [23/Sep/2014:01:40:45 +0000] "GET /pages/movepage.action?pageId=2298943959&position=append&targetId=2302876189&atl_token=1a8fab263d4644b9373a3453c9bceea99fdd9166&_=1411436439759 HTTP/1.0" 200 265 "https://extranet.atlassian.com/pages/reorderpages.action?key=ST" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0"
      172.24.12.144 - - [23/Sep/2014:01:41:04 +0000] "GET /pages/movepage.action?pageId=2301533426&position=below&targetId=2302876189&atl_token=1a8fab263d4644b9373a3453c9bceea99fdd9166&_=1411436461303 HTTP/1.0" 200 297 "https://extranet.atlassian.com/pages/reorderpages.action?key=ST" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0"
      172.24.12.144 - - [23/Sep/2014:01:41:06 +0000] "GET /pages/movepage.action?pageId=2301533426&position=above&targetId=2298943959&atl_token=1a8fab263d4644b9373a3453c9bceea99fdd9166&_=1411436465795 HTTP/1.0" 200 297 "https://extranet.atlassian.com/pages/reorderpages.action?key=ST" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0"
      

      App log(full version attached):

      2014-09-22 20:40:44,397 ERROR [catalina-exec-10] [sf.hibernate.util.JDBCExceptionReporter] logExceptions ERROR: deadlock detected
        Detail: Process 29730 waits for ShareLock on transaction 2482659; blocked by process 14749.
      Process 14749 waits for ShareLock on transaction 2482650; blocked by process 29730.
        Hint: See server log for query details.
      

        1. atlassian-confluence.log.1.gz
          1.47 MB
        2. pagemove.log
          211 kB
        3. Page Not Found.png
          13 kB
        4. Screen Shot 2015-09-29 at 12.22.49.png
          31 kB

            [CONFSERVER-35040] Page move action causes database deadlock and data corruption

            Many apologies Jan-Peter, I've told my robotic Minion to stop posting comments visible to customers, you should have a quieter inbox again.

            The comments let our developers know when their work on issues has been included in releases that get built, tested and deployed automatically to Cloud environments. Visibility is now restricted to staff only; please let me know if you're still getting emails.

            Regards,
            Mike Howells
            Cloud Release Manager

            Mike Howells added a comment - Many apologies Jan-Peter, I've told my robotic Minion to stop posting comments visible to customers, you should have a quieter inbox again. The comments let our developers know when their work on issues has been included in releases that get built, tested and deployed automatically to Cloud environments. Visibility is now restricted to staff only; please let me know if you're still getting emails. Regards, Mike Howells Cloud Release Manager

            @Atlassian

            By now I know that the Minion Bot of Hipchat ( I guess) is working. But I don't get the catch why the bot is posting comments related to internal devops. I get several mails a day send by the minion bot. Can I turn this off?

            Jan-Peter

            Jan-Peter Rusch added a comment - @Atlassian By now I know that the Minion Bot of Hipchat ( I guess) is working. But I don't get the catch why the bot is posting comments related to internal devops. I get several mails a day send by the minion bot. Can I turn this off? Jan-Peter

            psemeniuk this issue is mentioned in release Confluence 6.0.0-OD-2015.48.0-0523 just promoted to jirastudio-dev

            Deleted Account (Inactive) added a comment - psemeniuk this issue is mentioned in release Confluence 6.0.0-OD-2015.48.0-0523 just promoted to jirastudio-dev

            psemeniuk this issue is mentioned in commits included in the Confluence 5.9.1-OD-2015.47.1-0002 release being approved for production: CPU-64

            Deleted Account (Inactive) added a comment - psemeniuk this issue is mentioned in commits included in the Confluence 5.9.1-OD-2015.47.1-0002 release being approved for production: CPU-64

            Alain Forrester added a comment - - edited

            Hi Petro,

            Thanks, I understand theres a lot of work involved.

            That sound good about v5.8.x, I look forward to getting it running here and being pleasantly surprised

            The size of our instance, our use patterns (academic calendar) and internal processes mean that an upgrade to our environment isn't quite as quick and easy as all that. We'll be kicking off the next upgrade in Feb 2016.

            Thanks again for your work on this.
            Alain.

            Alain Forrester added a comment - - edited Hi Petro, Thanks, I understand theres a lot of work involved. That sound good about v5.8.x, I look forward to getting it running here and being pleasantly surprised The size of our instance, our use patterns (academic calendar) and internal processes mean that an upgrade to our environment isn't quite as quick and easy as all that. We'll be kicking off the next upgrade in Feb 2016. Thanks again for your work on this. Alain.

            Hi Alan,

            Unfortunately it's not possible. The volume of dev effort and testing which went into this issue is really huge.

            It's not about doing or not doing that for 5.7, it's about doing it for 5.7 vs. fixing other bugs.

            I highly recommend for you to upgrade to confluence 5.8.x. It's first major version when we had whole team dedicated to performance and I guarantee that you'll be pleasantly surprised post upgrade.

            Cheers
            Petro

            Petro Semeniuk (Inactive) added a comment - Hi Alan, Unfortunately it's not possible. The volume of dev effort and testing which went into this issue is really huge. It's not about doing or not doing that for 5.7, it's about doing it for 5.7 vs. fixing other bugs. I highly recommend for you to upgrade to confluence 5.8.x. It's first major version when we had whole team dedicated to performance and I guarantee that you'll be pleasantly surprised post upgrade. Cheers Petro

            Hi Petro,

            We have an upgrade planned to start next year. Given we're pretty badly affected by this bug, are there any plans to release a patch for earlier versions that would allow us to enable page moves until we're able to upgrade? We're on 5.7.5 at the moment.

            Thanks,
            Alain.

            Alain Forrester added a comment - Hi Petro, We have an upgrade planned to start next year. Given we're pretty badly affected by this bug, are there any plans to release a patch for earlier versions that would allow us to enable page moves until we're able to upgrade? We're on 5.7.5 at the moment. Thanks, Alain.

            stefan.ernst I'm afraid there is no way to limit that amount at the moment. However:
            since 5.6.18 reindex will kick in only if

            • ancestors have different permission sets (so if you move within a space without any permissions set on parent pages reindex won't happen)
            • page move is done cross space (in this case you can move small subtrees to limit re-index to get better throughput)

            As consequence: if you need to move large pages within a space you first can set permissions, wait permissions for re-index to happen and then do a page move as second step.

            You can open feature request for re-index performance, but honestly right now we are trying to close off all criticals and that's not very likely to be in anyone's backlog.

            EDIT: there is similar bug for performance improvement of page restrictions dialog: https://jira.atlassian.com/browse/CONF-32583

            depends on how it solved it might improve content permission of portion of page move.

            Petro Semeniuk (Inactive) added a comment - - edited stefan.ernst I'm afraid there is no way to limit that amount at the moment. However: since 5.6.18 reindex will kick in only if ancestors have different permission sets (so if you move within a space without any permissions set on parent pages reindex won't happen) page move is done cross space (in this case you can move small subtrees to limit re-index to get better throughput) As consequence: if you need to move large pages within a space you first can set permissions, wait permissions for re-index to happen and then do a page move as second step. You can open feature request for re-index performance, but honestly right now we are trying to close off all criticals and that's not very likely to be in anyone's backlog. EDIT: there is similar bug for performance improvement of page restrictions dialog: https://jira.atlassian.com/browse/CONF-32583 depends on how it solved it might improve content permission of portion of page move.

            Thank you Petro,

            is it possible in any way to limit the amount of reindexing tasks being put in the queue after a page move? This would be the last place where page moves have negative impact on large instances because the moved pages block the index queue for quite some time

            Stefan Ernst added a comment - Thank you Petro, is it possible in any way to limit the amount of reindexing tasks being put in the queue after a page move? This would be the last place where page moves have negative impact on large instances because the moved pages block the index queue for quite some time

            All,

            This issue is fixed in 5.8.17. Technical details:

            • We allow only one move per space at a time (combined with perf fixes in https://jira.atlassian.com/browse/CONF-35396 it doesn't really makes things slower) .
            • Move from space to space considered will lock two of them.
            • We did extensive testing and there were no deadlocks at database level and no cycles created as part of page move.

            Any questions/concerns please let me know.

            Cheers
            Petro

            Petro Semeniuk (Inactive) added a comment - All, This issue is fixed in 5.8.17. Technical details: We allow only one move per space at a time (combined with perf fixes in https://jira.atlassian.com/browse/CONF-35396 it doesn't really makes things slower) . Move from space to space considered will lock two of them. We did extensive testing and there were no deadlocks at database level and no cycles created as part of page move. Any questions/concerns please let me know. Cheers Petro

              psemeniuk Petro Semeniuk (Inactive)
              psemeniuk Petro Semeniuk (Inactive)
              Affected customers:
              51 This affects my team
              Watchers:
              67 Start watching this issue

                Created:
                Updated:
                Resolved: