Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-75497

Long running permission-change operations can time out and throw misleading failure messages on the UI but may finish successfully in the background

    XMLWordPrintable

Details

    Description

      We don't plan to backport the fix for this bug to earlier Long Term Support versions

      The fix for this bug isn't suitable for backporting to a bug fix release for any previous LTS versions. This is often because the fix is considered too high risk to implement in an older version.

      The fix for this issue will be included in future Long Term Support versions.

      Note

      An important note on this one: The misleading message is still present in the product, however we've dramatically improved the performance of the permission changes operation in order to close the ticket. In theory it should be so fast you don't ever see the message.

      In our testing, the change is now ~30x faster, taking a 7-8 minute change down to 3-4 seconds on the front end. To achieve this has required some significant changes under the hood, so it is unlikely this fix will be a candidate for backporting. I'll update the ticket in the future should that change, and have added the message below.

      Issue Summary

      When a permission change is applied on the root page of a space with (10x/100x) thousands of nested child-pages, the operation can take significant time to finish.

      Occasionally, it can run for a duration long enough to be timed out by the proxy, or by Confluence's UI itself, resulting in an error that says:
      "There was an error submitting the page restrictions. Please try again later."

      While that is shown on the UI, the user's http thread may still be running in the background, visible as:

      • a long running task in the atlassian-confluence.log:
        YYYY-MM-DD HH:MM:SS,mmm WARN [http-nio-8090-exec-NNNN] [confluence.util.profiling.DefaultActivityMonitor] close Exceeded the threshold of 60000 ms: ActivitySnapshot{startTime=1628395400000, threadId=XXX, threadName='http-nio-8090-exec-NNNN', userId='userid', type='web-request', summary='/pages/setcontentpermissions.action'}
         -- referer: https://<baseURL>/display/<spaceKey> | url: /pages/setcontentpermissions.action | traceId: <traceID> | userName: userid
        
      • as a potentially-stuck thread in catalina.out:
        DD-Mmm-YYYY HH:MM:SS.mmm WARNING [ContainerBackgroundProcessor[StandardEngine[Standalone]]] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread "http-nio-8090-exec-NNNN" (id=XXX) has been active for 60,008 milliseconds (since M/D/YY H:MM AM) to serve the same request for https://<baseURL>/pages/setcontentpermissions.action and may be stuck (configured threshold for this StuckThreadDetectionValve is 60 seconds). There is/are 1 thread(s) in total that are monitored by this Valve and may be stuck.
        
      • (in a Confluence Data Center clustered setup) as an entry in the node's Live activity tab (under Administration > Clustering > (specific node) > '...' button)


      Going by the error on the UI, the user may retry the permission-change operation multiple times, starting a new http thread with every attempt.

      Each subsequent thread will try to factor in the same change as before, on the same data set, and will eventually fail with errors like these:

      Failed to add entry to queue; UNIQUE KEY constraint 'cp_unique_user_groups'. Cannot insert duplicate key in object 'dbo.CONTENT_PERM'.
      
      YYYY-MM-DD HH:MM:SS,mmm ERROR [http-nio-8090-exec-NNNN] [atlassian.xwork.interceptors.TransactionalInvocation] commitOrRollbackTransaction Action /pages/setcontentpermissions.action (SetPagePermissionsAction.setContentPermissions()) is already completed and can not be committed again.
       -- referer: https://<baseURL>/display/<spaceKey>/<rootPageTitle> | url: /pages/setcontentpermissions.action | traceId: <traceID> | userName: <userid>
      


      The very first attempt will eventually finish, with an entry like this in catalina.out showing the thread's end after a long execution time:

      DD-Mmm-YYYY HH:MM:SS.mmm WARNING [ContainerBackgroundProcessor[StandardEngine[Standalone]]] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadCompleted Thread "http-nio-8090-exec-NNNN" (id=XXX) was previously reported to be stuck but has completed. It was active for approximately 7,200,123 milliseconds.
      

      Steps to Reproduce

      1. Setup a Confluence space with a root page and (10x/100x) thousands of nested child-pages with attachments etc.
      2. Modify permissions (add or remove users) at the root page, hit apply

      Expected Results

      Success if the permission change succeeded. Failure if there was a problem.

      Actual Results

      • Errors indicating failure may be encountered on the basis of a timeout, even though the operation may still be running, and may eventually finish without errors.
      • Based on the error that is displayed, user is misled into triggering multiple futile retries, all of which will fail if the first attempt succeeds.


      Confluence:

      • does not keep a track of these duplicate ops and allows their (futile) execution (leading to significant CPU/DB/IO/memory usage)
      • does not convey an accurate state of the system which lead to ambiguity in the first place

      Workaround

      1. If permissions are changed on a page with (10x/100x) thousands of nested child-pages, even if the UI times out/throws an error, as mentioned in the description, track progress in the following places:
        • atlassian-confluence.log
        • catalina.out
        • (on Confluence Data Center) node's Live activity tab

      2. Once thread termination is seen at the places listed above, navigate to Administration > Cache Management > (scroll all the way down) > click Flush all. On a multi-node cluster, the cache may need to be flushed on all nodes.

      3. After flushing the Confluence cache, log out of Confluence, clear the browser-cache, log back in, and confirm if the desired permission change has been applied.

      Attachments

        Issue Links

          Activity

            People

              glipatov George Lipatov
              5c3a8aca27ce Mohit Sharma
              Votes:
              31 Vote for this issue
              Watchers:
              53 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: