Uploaded image for project: 'Jira Data Center'
  1. Jira Data Center
  2. JRASERVER-68548

Cluster cache replication can cause high CPU across all nodes in the cluster and require a restart

    XMLWordPrintable

Details

    Description

      This issue is similar to JRASERVER-63137 in symptom (but likely a slightly different cause).

      We have 3 nodes in a Jira cluster. We saw CPU increase and plateau on one node, and then each other node increased as well:

      Thread dumps show lots of RMI TCP Connection threads 'stuck waiting for response from external systems' (see attached thread dumps, run them through fastthread.io).

      This appears to indicate an issue with the cache replication. There were no user reports of incorrect cached data however.

      There were reports of timeouts and slowness processing requests, which would be expected with CPU this high. Without performing a restart when we had, this issue would have caused a full system outage.

      Workaround

      Perform a rolling restart of the cluster (bring each node down and back up before bringing the next node down). CPU will drop back to normal.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dunterwurzacher Denise Unterwurzacher [Atlassian] (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: