This issue does not affect Data Center users. Confluence Data Center is currently able to run the clean-up job without problems.

      Issue Description

      There's a scheduled job executed by Confluence job handler called Service Provider Session Remover accountable for removing expired OAuth tokens from BANDANA table. This job is not working properly, and expired tokens are not being cleared as expected. With enough expired tokens in the database, this can eventually lead to an application crash as the logic to handle tokens is negatively impacted by the number of expired and useless entries stored in the database.

      Steps to Reproduce

      1. Set up JIRA + Confluence applinked together through an OAuth relationship
      2. Add Confluence gadget feed to JIRA
      3. Create new JIRA Dashboard and add Confluence Page Gadget, do not "Login and Approve"
      4. Every time the Dashboard/Gadget is loaded, observe that a new com.atlassian.oauth.serviceprovider.ServiceProviderTokenStore.token.<somestring> row gets added in BANDANA table in Confluence, e.g.:
        bandanakey
        com.atlassian.oauth.serviceprovider.ServiceProviderTokenStore.token.5IWZaCCOpzibZTUboRNaJU5WmddxePSe
        
        bandanavalue
         <properties>
           <property name="token" value="dqgAd6Oc9JU2G4ZuM6ec0PLmkIMf670w"/> 
           <property name="creationTime" value="1521048328968"/> 
           <property name="tokenSecret" value="ZAN3c7Yu6zEP8pUQVWB25e5odUrsRNbi"/>   
           <property name="authorization" value="NONE"/> 
           <property name="version" value="V_1_0_A"/>
           <property name="properties" value="   
         xoauth_app_url=http\://confluence.example.com/rest/gadgets/1.0/g/com.atlassian.confluence.plugins.gadgets\:confluence-page-gadget/gadgets/confluence-page-gadget.xml
         "/> 
           <property name="type" value="REQUEST"/>   
           <property name="callback" value="http://jira.example.com/plugins/servlet/gadgets/oauth-callback"/> 
           <property name="consumerKey" value="jira:2f5ae229-2135-43a8-bfd4-77f1cd14b3cc"/>  
           <property name="timeToLive" value="600000"/>  
         </properties>
        
      5. Load the JIRA Dashboard and Gadget several times and observe there are now several such entries in the BANDANA table.
      6. Leave Confluence running for 8 hours or more to ensure Service Provider Session Remover runs. It will not and you will be able to see evidence on logs that it has failed to execute. You should have an entry like the one below:
        runJob Scheduled job CompatibilityPluginScheduler.JobRunnerKey.Service Provider Session Remover#CompatibilityPluginScheduler.JobId.Service Provider Session Remover completed unsuccessfully with response JobRunnerResponse[runOutcome=ABORTED,message='Job has unexpected run mode RUN_LOCALLY'].
        

      Expected Behavior

      The Service Provider Session Remover runs every 8 hours and clears out the OAuth tokens from BANDANA

      Actual Behavior

      The Service Provider Session Remover attempts to run every 8 hours but is aborted. The scheduler_run_details table shows:

         id    |                               job_id                                |       start_time        | duration | outcome |                 message                 
      ---------+---------------------------------------------------------------------+-------------------------+----------+---------+-----------------------------------------
        428282 | CompatibilityPluginScheduler.JobId.Service Provider Session Remover | 2018-03-27 02:56:11.413 |        6 | A       | Job has unexpected run mode RUN_LOCALLY
      

      Technical details

      The above means that if the dashboard is loaded 1000 times, there will be 1000 such entries in the database. These tokens are supposed to be cleared from BANDANA via the scheduled job Service Provider Session Remover which runs every 8 hours, however the job is currently broken.

      As expired/unused tokens are not removed, instances will eventually hold a huge number of tokens in their database. Due to the logic used in the OAuth Plugin and documented in OAUTH-321, as the number of tokens in the database grows, so does the complexity of the problem it needs to resolve. This is because when inserting a new token to the database, it will iterate through a list of existing tokens to validate each one of them before inserting a new one.

      The intent of this logic is to validate duplicated entries as well as keep the list of usable tokens up to date, however, as the set of data ( n ) that the algorithm needs to resolve gets bigger, so does the complexity of the problem, which demands more computational resources to get resolved, making the algorithm an O (n²) complexity class one, leading to eventual outages depending on the instance usage and available resources.

      Symptoms

      The Confluence instances that run into the performance/stability problems due to high number of stored OAuth tokens will observe one or more of the following symptoms:

      • High CPU Usage.
      • High Memory Usage.
      • A huge number of entries in BANDANA table related to Service Provider tokens. This can be confirmed by running below query:
         select count (bandanaid) from bandana where BANDANAKEY like '%com.atlassian.oauth.serviceprovider.ServiceProviderTokenStore.token%';
      • Reports of long-running/stuck threads in Tomcat and Confluence logs for /plugins/servlet/oauth/request-token, e.g.:
        14-Mar-2018 00:40:18.482 WARNING [ContainerBackgroundProcessor[StandardEngine[Standalone]]] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread "http-nio-8101-exec-12" (id=232) has been active for 69,763 milliseconds (since 3/14/18 12:39 AM) to serve the same request for https://confluence.example.com/plugins/servlet/oauth/request-token and may be stuck (configured threshold for this StuckThreadDetectionValve is 60 seconds). There is/are 9 thread(s) in total that are monitored by this Valve and may be stuck.
        
      • Long-running/stuck threads in thread dumps that mention com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore.put in the stack trace. Examples:
        "http-nio-8443-exec-52" #320 daemon prio=5 os_prio=0 tid=0x00007f027c12a800 nid=0x14f9 runnable [0x00007f0235fb5000]
           java.lang.Thread.State: RUNNABLE
        	at org.hibernate.bytecode.internal.javassist.BulkAccessor.getPropertyValues(BulkAccessor.java:58)
        	at org.hibernate.bytecode.internal.javassist.AccessOptimizerAdapter.getPropertyValues(AccessOptimizerAdapter.java:55)
        	at org.hibernate.tuple.entity.PojoEntityTuplizer.getPropertyValuesWithOptimizer(PojoEntityTuplizer.java:234)
        	at org.hibernate.tuple.entity.PojoEntityTuplizer.getPropertyValues(PojoEntityTuplizer.java:212)
        ...
        	at com.atlassian.confluence.setup.bandana.ConfluenceCachingBandanaPersister.retrieve(ConfluenceCachingBandanaPersister.java:78)
        	at com.atlassian.confluence.setup.bandana.ConfluenceCachingBandanaPersister.retrieve(ConfluenceCachingBandanaPersister.java:71)
        	at com.atlassian.bandana.DefaultBandanaManager.getValue(DefaultBandanaManager.java:32)
        	at com.atlassian.bandana.DefaultBandanaManager.getValue(DefaultBandanaManager.java:24)
        ...
        	at com.atlassian.sal.confluence.pluginsettings.ConfluencePluginSettings.get(ConfluencePluginSettings.java:67)
        	at com.atlassian.oauth.shared.sal.HashingLongPropertyKeysPluginSettings.get(HashingLongPropertyKeysPluginSettings.java:25)
        	at com.atlassian.oauth.shared.sal.PrefixingPluginSettings.get(PrefixingPluginSettings.java:17)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings$IsValidTokenReference.apply(PluginSettingsServiceProviderTokenStore.java:417)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings$IsValidTokenReference.apply(PluginSettingsServiceProviderTokenStore.java:409)
        ...
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.getTokenKeySet(PluginSettingsServiceProviderTokenStore.java:383)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.getTokenKeys(PluginSettingsServiceProviderTokenStore.java:320)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.addTokenKey(PluginSettingsServiceProviderTokenStore.java:328)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.put(PluginSettingsServiceProviderTokenStore.java:292)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore.put(PluginSettingsServiceProviderTokenStore.java:139)
        ...
        	at com.atlassian.oauth.serviceprovider.internal.DelegatingTokenStoreImpl.put(DelegatingTokenStoreImpl.java:31)
        	at com.atlassian.oauth.serviceprovider.internal.servlet.RequestTokenServlet.doPostInTransaction(RequestTokenServlet.java:109)
        	at com.atlassian.oauth.serviceprovider.internal.servlet.TransactionalServlet$2.serve(TransactionalServlet.java:48)
        	at com.atlassian.oauth.serviceprovider.internal.servlet.TransactionalServlet$3.doInTransaction(TransactionalServlet.java:57)
        	at com.atlassian.oauth.serviceprovider.internal.servlet.TransactionalServlet$3.doInTransaction(TransactionalServlet.java:54)
        ...
        
        "http-nio-8101-exec-297" #9009 daemon prio=5 os_prio=0 tid=0x00007fb9e851a000 nid=0x8f1c runnable [0x00007fb9dc3bb000]
           java.lang.Thread.State: RUNNABLE
        	at java.net.SocketInputStream.socketRead0(Native Method)
        	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        	at java.net.SocketInputStream.read(SocketInputStream.java:170)
        	at java.net.SocketInputStream.read(SocketInputStream.java:141)
        	at oracle.net.ns.Packet.receive(Packet.java:311)
        	at oracle.net.ns.DataPacket.receive(DataPacket.java:105)
        	at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:305)
        	at oracle.net.ns.NetInputStream.read(NetInputStream.java:249)
        	at oracle.net.ns.NetInputStream.read(NetInputStream.java:171)
        	at oracle.net.ns.NetInputStream.read(NetInputStream.java:89)
        	at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
        	at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
        	at oracle.jdbc.driver.T4CMAREngineStream.unmarshalUB1(T4CMAREngineStream.java:429)
        ...
        	- locked <0x000000067f800000> (a oracle.jdbc.driver.T4CConnection)
        	at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeUpdate(OraclePreparedStatementWrapper.java:1361)
        ...
        	at com.atlassian.confluence.setup.bandana.ConfluenceCachingBandanaPersister.lambda$retrieve$0(ConfluenceCachingBandanaPersister.java:80)
        	at com.atlassian.confluence.setup.bandana.ConfluenceCachingBandanaPersister$$Lambda$354/101785611.get(Unknown Source)
        ...
        	at com.atlassian.sal.confluence.pluginsettings.ConfluencePluginSettings.get(ConfluencePluginSettings.java:67)
        	at com.atlassian.oauth.shared.sal.HashingLongPropertyKeysPluginSettings.get(HashingLongPropertyKeysPluginSettings.java:25)
        	at com.atlassian.oauth.shared.sal.PrefixingPluginSettings.get(PrefixingPluginSettings.java:17)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.getTokenKeySet(PluginSettingsServiceProviderTokenStore.java:368)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.getConsumerTokens(PluginSettingsServiceProviderTokenStore.java:352)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.addConsumerToken(PluginSettingsServiceProviderTokenStore.java:360)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.put(PluginSettingsServiceProviderTokenStore.java:293)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore.put(PluginSettingsServiceProviderTokenStore.java:139)
        ...
        

      Workaround

      As a workaround, we need to clear tokens stored in BANDANA table manually, since the Scheduled Job is not doing it for us. To do that:

      1. Shutdown Confluence.
      2. Connect to your database client.
      3. Take a backup from your Confluence database in case we may need to revert it to its pre-delete statement state.
      4. Execute below query:
        delete from BANDANA where BANDANAKEY like 'com.atlassian.oauth.serviceprovider.ServiceProviderTokenStore.token%' and bandanavalue like '%value="REQUEST"%'
        
      5. Restart the application.

      After this is done, you should no longer observe the symptoms caused by this issue, however, users may be prompted to re-authenticate against applications that were relying on the OAuth tokens previously stored in the database once the clean-up is completed.

      It is important to highlight that this workaround may need to be repeated from time to time until this bug is solved if symptoms re-occur. Instance administrators may want to setup a trigger in their database clients to execute this routine automatically based upon certain circumstances, like for example, BANDANA table going full of records if they intend to automate this task.

            [CONFSERVER-55274] OAuth - Service Provider - Session Remover Job is broken

            Minh Tran added a comment -

            If you're running the Confluence 6.6 Enterprise release, a fix for this issue is now available in Confluence 6.6.8, which you can find in the Download Archives.

            Minh Tran added a comment - If you're running the Confluence 6.6 Enterprise release, a fix for this issue is now available in Confluence 6.6.8, which you can find in the Download Archives .

            Azfar Masut added a comment - - edited

            We've spent almost a week trying to figure out what happened to our instance, tons of restarts and downtime to our users, only to find out at the end, it was due to this bug. Would appreciate if you guys can link the error pattern with Hercules so it can be captured with the Troubleshooting & Support Tools Log Analyzer. Thanks!

             

            Conf 6.7.1

            Azfar Masut added a comment - - edited We've spent almost a week trying to figure out what happened to our instance, tons of restarts and downtime to our users, only to find out at the end, it was due to this bug. Would appreciate if you guys can link the error pattern with Hercules so it can be captured with the Troubleshooting & Support Tools Log Analyzer. Thanks!   Conf 6.7.1

            Minh Tran added a comment -

            A fix for this issue is available to Server and Data Center customers in Confluence 6.10.2
            Upgrade now or check out the Release Notes to see what other issues are resolved.

            Minh Tran added a comment - A fix for this issue is available to Server and Data Center customers in Confluence 6.10.2 Upgrade now or check out the Release Notes to see what other issues are resolved.

            Hi we have merged the code and the fix will be out in next enterprise release 6.6.8.

            Zhenhuan Zhou (Inactive) added a comment - Hi we have merged the code and the fix will be out in next enterprise release 6.6.8.

            We wait with baited breath for the backport to the Enterprise version...

            banos solomou added a comment - We wait with baited breath for the backport to the Enterprise version...

            andrey.oleynik1934811064, this has been triaged for backport to 6.6.x. It is not possible to say at this point exactly when that would be completed and released.

            Adam Barnes (Inactive) added a comment - andrey.oleynik1934811064 , this has been triaged for backport to 6.6.x. It is not possible to say at this point exactly when that would be completed and released.

            Minh Tran added a comment -

            A fix for this issue is available to Server and Data Center customers in Confluence 6.8.3
            Upgrade now or check out the Release Notes to see what other issues are resolved.

            Minh Tran added a comment - A fix for this issue is available to Server and Data Center customers in Confluence 6.8.3 Upgrade now or check out the Release Notes to see what other issues are resolved.

            Minh Tran added a comment -

            A fix for this issue is available to Server and Data Center customers in Confluence 6.9.1
            Upgrade now or check out the Release Notes to see what other issues are resolved.

            Minh Tran added a comment - A fix for this issue is available to Server and Data Center customers in Confluence 6.9.1 Upgrade now or check out the Release Notes to see what other issues are resolved.

            Hi aditya.verma8, sorry, it is not really my call to release a patch for a specific version of confluence. Normally we don't patch the product as this may introduce much more trouble in future maintenance and upgrade. If you are experiencing any difficulties in upgrading, you may reach out to the support team for assistance. Thanks for your understanding. 

            Zhenhuan Zhou (Inactive) added a comment - - edited Hi aditya.verma8 , sorry, it is not really my call to release a patch for a specific version of confluence. Normally we don't patch the product as this may introduce much more trouble in future maintenance and upgrade. If you are experiencing any difficulties in upgrading, you may reach out to the support team for assistance. Thanks for your understanding. 

            Hi Zhenhuan,

             

            We have a large instance of confluence, we are using confluence version 6.2.4. Can you provide patch for this fix, as upgrade is not a solution for us at the moment.

             

            Regards,
            Aditya

            Aditya Verma added a comment - Hi Zhenhuan,   We have a large instance of confluence, we are using confluence version 6.2.4. Can you provide patch for this fix, as upgrade is not a solution for us at the moment.   Regards, Aditya

              zzhou Zhenhuan Zhou (Inactive)
              mhorlle Marcelo Horlle
              Affected customers:
              13 This affects my team
              Watchers:
              46 Start watching this issue

                Created:
                Updated:
                Resolved: