Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-35396

Page move action is too slow when moving a large number of pages

      NOTE: This bug report is for Confluence Server. Using Confluence Cloud? See the corresponding bug report.

      Page move action requires a significant amount of insert/update/delete to the database as moving one page requires updating that page itself and all its descendant pages. This heavy procedure might cause bad user experience as they have to wait a considerable amount of time before their actions are completed. In addition, it's also the root cause of DB deadlock as it locks some tables for a quite long period of time.

      Activities that slows down moving pages:

      • Select/Update/Insert to CONFANCESTORS table
      • Update JOURNALENTRY table
      • Update links in moved pages.

      Problems found with current implementation:

      • CONTENTPROPERTIES is set loaded "eager" causing N+1 problem.
      • Updating page's links procedure is called when pages are moved inside the same space (this heavy procedure should be skipped in this case).
      • Each getAncestors() call creates one SELECT query to DB.

      Update 17/11/2015
      There is definitely some performance improvement in 5.8.6 (https://jira.atlassian.com/browse/CONF-35396?focusedCommentId=827923&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-827923) but not the order of magnitude I expected. Reopening issue because of that.

            [CONFSERVER-35396] Page move action is too slow when moving a large number of pages

            All,

            A bit of history: I opened https://jira.atlassian.com/browse/CONF-35040 for tracking deadlocks during page moves (because of concurrent transactions) and https://jira.atlassian.com/browse/CONF-34870 to track performance.

            https://jira.atlassian.com/browse/CONF-34870 was then reworded and resolved and superseded by https://jira.atlassian.com/browse/CONF-35396 (current ticket).

            Original state of things was that moving couple dozen pages within a space will take over one minute (functionality easy accessible from space hierarchy view by all users). Now it shouldn't take more than few seconds.

            Many people reported that move of few hundred pages would took overnight and caused heavy server load - it shouldn't happen anymore.

            Cross space page move with many descendants (1K+) still be heavy on server because of permission re-index and attachment re-index. Nevertheless you should expect cross space move to be at least as twice as it was before.

            For admins of instances, keep in mind that 3rd party plugins usually registered as event listeners and also can contribute to page move slowness.

            To sump up:

            • Page move now faster. We confirmed that both by synthetic data as well as on our production instances.
            • This issue was targeting cases where move of few hundred pages would take overnight. Now such things should not happen.
            • If you need to move a large page tree (5K+) please move smaller subtrees first. You can also open separate CONF ticket for that volume of data, however I doubt that it will be addressed soon.

            Cheers
            Petro

            Petro Semeniuk (Inactive) added a comment - All, A bit of history: I opened https://jira.atlassian.com/browse/CONF-35040 for tracking deadlocks during page moves (because of concurrent transactions) and https://jira.atlassian.com/browse/CONF-34870 to track performance. https://jira.atlassian.com/browse/CONF-34870 was then reworded and resolved and superseded by https://jira.atlassian.com/browse/CONF-35396 (current ticket). Original state of things was that moving couple dozen pages within a space will take over one minute (functionality easy accessible from space hierarchy view by all users). Now it shouldn't take more than few seconds. Many people reported that move of few hundred pages would took overnight and caused heavy server load - it shouldn't happen anymore. Cross space page move with many descendants (1K+) still be heavy on server because of permission re-index and attachment re-index. Nevertheless you should expect cross space move to be at least as twice as it was before. For admins of instances, keep in mind that 3rd party plugins usually registered as event listeners and also can contribute to page move slowness. To sump up : Page move now faster. We confirmed that both by synthetic data as well as on our production instances. This issue was targeting cases where move of few hundred pages would take overnight. Now such things should not happen. If you need to move a large page tree (5K+) please move smaller subtrees first. You can also open separate CONF ticket for that volume of data, however I doubt that it will be addressed soon. Cheers Petro

            Also it's best to upgrade to 5.8.17 (it was released just few hours ago). It has proper locking for concurrent moves in place as well as logging of all move events (https://jira.atlassian.com/browse/CONF-35040).

            Petro Semeniuk (Inactive) added a comment - Also it's best to upgrade to 5.8.17 (it was released just few hours ago). It has proper locking for concurrent moves in place as well as logging of all move events ( https://jira.atlassian.com/browse/CONF-35040 ).

            Hi Jens, please open support ticket and summon me directly (psemeniuk).

            Alternatively you can send gzipped logs to psemeniuk@atlassian.com. However support ticket is better since we can maintain history of all investigation there.

            Petro Semeniuk (Inactive) added a comment - Hi Jens, please open support ticket and summon me directly ( psemeniuk ). Alternatively you can send gzipped logs to psemeniuk@atlassian.com. However support ticket is better since we can maintain history of all investigation there.

            Dear Petro,

            I can provide you the requested thread dumps and log information, but I do not want to attach it to this open issue. Can I somehow send it to you privately / via support issue?

            Jens Kasperek added a comment - Dear Petro, I can provide you the requested thread dumps and log information, but I do not want to attach it to this open issue. Can I somehow send it to you privately / via support issue?

            Hi Jens, first of all many thanks for upgrading to 5.8.16 and for the feedback!

            I have to say that having to wait 6-10 minutes to move 900 pages is indeed disappointing. I'm reopening this ticket now.

            Would you be able to provide me with

            • server logs produced for period when page moves are done
            • thread dumps taken during page move with 30 seconds each between them (so there will be 10-20 thread dumps in total)

            That will help me better understand where cpu time is spent. Whilst taking thread dumps is suboptimal I'm afraid we don't have fine grained logging around page move so sampling threads the only sure way to see where page move got stack for so long.

            As for concurrent page move - it still vulnerable to deadlocks in 5.8.16 (and concurrent within a space will slow down page move dramatically), it was fixed as part of https://jira.atlassian.com/browse/CONF-35040.

            Cheers
            Petro

            Petro Semeniuk (Inactive) added a comment - - edited Hi Jens, first of all many thanks for upgrading to 5.8.16 and for the feedback! I have to say that having to wait 6-10 minutes to move 900 pages is indeed disappointing. I'm reopening this ticket now. Would you be able to provide me with server logs produced for period when page moves are done thread dumps taken during page move with 30 seconds each between them (so there will be 10-20 thread dumps in total) That will help me better understand where cpu time is spent. Whilst taking thread dumps is suboptimal I'm afraid we don't have fine grained logging around page move so sampling threads the only sure way to see where page move got stack for so long. As for concurrent page move - it still vulnerable to deadlocks in 5.8.16 (and concurrent within a space will slow down page move dramatically), it was fixed as part of https://jira.atlassian.com/browse/CONF-35040 . Cheers Petro

            Hi Petro,
            Unfortunately, I was too enthusiastic. The issue was not resolved sufficiently!

            Here are my test results for moving 900 pages (same space) on Conf 5.5.6 and Conf 5.8.16:

            • On Conf 5.5.6, moving 900 pages to a different space took ~ 15 minutes
            • On Conf 5.5.6, moving 900 pages within the same space by using reorder.action will show the move just after the moment you release the mouse, but it will take 12 minutes to do the actual move (at this time, users will start multiple page moves at once since UI is telling them 'Everything is moved properly. Please go ahead with the page moves (and crash the system)'
            • On Conf 5.8.16, moving 900 pages within the same space by using reorder.action will show the move just after the moment you release the mouse, but it will take 6 minutes to do the actual move (at this time, users will start multiple page moves at once since UI is telling them 'Everything is moved properly. Please go ahead with the page moves (and crash the system)'
            • On Conf 5.8.16, moving 900 pages to a different space took ~ 10 minutes
              I observed these figures by monitoring the JDBC connections on a system without any load!

            Now, I ask myself how you measured this pagemove action and got results of just 3 seconds for moving 1,200 pages?

            Jens Kasperek (Bosch GmbH) (Inactive) added a comment - - edited Hi Petro, Unfortunately, I was too enthusiastic. The issue was not resolved sufficiently! Here are my test results for moving 900 pages (same space) on Conf 5.5.6 and Conf 5.8.16: On Conf 5.5.6 , moving 900 pages to a different space took ~ 15 minutes On Conf 5.5.6 , moving 900 pages within the same space by using reorder.action will show the move just after the moment you release the mouse, but it will take 12 minutes to do the actual move (at this time, users will start multiple page moves at once since UI is telling them 'Everything is moved properly. Please go ahead with the page moves (and crash the system)' On Conf 5.8.16 , moving 900 pages within the same space by using reorder.action will show the move just after the moment you release the mouse, but it will take 6 minutes to do the actual move (at this time, users will start multiple page moves at once since UI is telling them 'Everything is moved properly. Please go ahead with the page moves (and crash the system)' On Conf 5.8.16 , moving 900 pages to a different space took ~ 10 minutes I observed these figures by monitoring the JDBC connections on a system without any load! Now, I ask myself how you measured this pagemove action and got results of just 3 seconds for moving 1,200 pages?

            Hi Jens, do you have number of how long page move of 900 within same space was taking before?

            Petro Semeniuk (Inactive) added a comment - Hi Jens, do you have number of how long page move of 900 within same space was taking before?

            jens.kasperek glad to hear that you feel improvement! Please keep in mind that till 5.8.17 is out page move is racy and might cause deadlocks. Integrity aspects of page moved were addressed in https://jira.atlassian.com/browse/CONF-35040?focusedCommentId=825123&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-825123

            As for cross space move -> it's by design that it will trigger content permission reindex for whole page tree. The only advise I can give is to try move small subtrees. With stats you posted it seems like you have 150 pages in average per space (is that right?) so concurrent page move shouldn't be stability issue once you on 5.8.17.

            UI errors: erm... sorry, as part of these tickets I haven't touched UI intentionally. I know it's rusty,

            P.S. there might be some performance wins for index queue once https://jira.atlassian.com/browse/CONF-32583 is addressed.

            Petro Semeniuk (Inactive) added a comment - jens.kasperek glad to hear that you feel improvement! Please keep in mind that till 5.8.17 is out page move is racy and might cause deadlocks. Integrity aspects of page moved were addressed in https://jira.atlassian.com/browse/CONF-35040?focusedCommentId=825123&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-825123 As for cross space move -> it's by design that it will trigger content permission reindex for whole page tree. The only advise I can give is to try move small subtrees. With stats you posted it seems like you have 150 pages in average per space (is that right?) so concurrent page move shouldn't be stability issue once you on 5.8.17. UI errors: erm... sorry, as part of these tickets I haven't touched UI intentionally. I know it's rusty, P.S. there might be some performance wins for index queue once https://jira.atlassian.com/browse/CONF-32583 is addressed.

            Here comes an update from our large Enterprise instance (155,000 users - 2,300 spaces - 300,000 pages of latest version):

            • Moving Pages within the same space is now much faster. We moved 900 pages within 10 seconds! This is great!
            • Moving Pages to a different space still causes issues. These are:
              • While moving 900 pages to a different space, we got an error on UI: 'Problem contacting the server...' (as a normal user I would guess that I should restart the action)
              • After clicking 'Cancel' the pages are still within the old space
              • However, in the background the request was processed after ~ 7 minutes
              • User sees the pages on the new space after 7 minutes, but gets not informed
              • Anyway, the move is now faster than it was on previous versions of Confluence
            • Our index queue was very busy after the move and took 3 minutes to process all move actions - during peak times this could cause issues

            All in all, we are glad about this first fix, but we still see some room for improvement! Thanks for taking care about this issue!

            Jens Kasperek (Bosch GmbH) (Inactive) added a comment - Here comes an update from our large Enterprise instance (155,000 users - 2,300 spaces - 300,000 pages of latest version): Moving Pages within the same space is now much faster. We moved 900 pages within 10 seconds! This is great! Moving Pages to a different space still causes issues. These are: While moving 900 pages to a different space, we got an error on UI: 'Problem contacting the server...' (as a normal user I would guess that I should restart the action) After clicking 'Cancel' the pages are still within the old space However, in the background the request was processed after ~ 7 minutes User sees the pages on the new space after 7 minutes, but gets not informed Anyway, the move is now faster than it was on previous versions of Confluence Our index queue was very busy after the move and took 3 minutes to process all move actions - during peak times this could cause issues All in all, we are glad about this first fix, but we still see some room for improvement! Thanks for taking care about this issue!

            modwarko sorry, no backport to 5.7.

            Petro Semeniuk (Inactive) added a comment - modwarko sorry, no backport to 5.7.

              psemeniuk Petro Semeniuk (Inactive)
              honguyen Hoang Nguyen (Inactive)
              Affected customers:
              30 This affects my team
              Watchers:
              38 Start watching this issue

                Created:
                Updated:
                Resolved: