Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-55274

OAuth - Service Provider - Session Remover Job is broken

    XMLWordPrintable

Details

    Description

      This issue does not affect Data Center users. Confluence Data Center is currently able to run the clean-up job without problems.

      Issue Description

      There's a scheduled job executed by Confluence job handler called Service Provider Session Remover accountable for removing expired OAuth tokens from BANDANA table. This job is not working properly, and expired tokens are not being cleared as expected. With enough expired tokens in the database, this can eventually lead to an application crash as the logic to handle tokens is negatively impacted by the number of expired and useless entries stored in the database.

      Steps to Reproduce

      1. Set up JIRA + Confluence applinked together through an OAuth relationship
      2. Add Confluence gadget feed to JIRA
      3. Create new JIRA Dashboard and add Confluence Page Gadget, do not "Login and Approve"
      4. Every time the Dashboard/Gadget is loaded, observe that a new com.atlassian.oauth.serviceprovider.ServiceProviderTokenStore.token.<somestring> row gets added in BANDANA table in Confluence, e.g.:
        bandanakey
        com.atlassian.oauth.serviceprovider.ServiceProviderTokenStore.token.5IWZaCCOpzibZTUboRNaJU5WmddxePSe
        
        bandanavalue
         <properties>
           <property name="token" value="dqgAd6Oc9JU2G4ZuM6ec0PLmkIMf670w"/> 
           <property name="creationTime" value="1521048328968"/> 
           <property name="tokenSecret" value="ZAN3c7Yu6zEP8pUQVWB25e5odUrsRNbi"/>   
           <property name="authorization" value="NONE"/> 
           <property name="version" value="V_1_0_A"/>
           <property name="properties" value="   
         xoauth_app_url=http\://confluence.example.com/rest/gadgets/1.0/g/com.atlassian.confluence.plugins.gadgets\:confluence-page-gadget/gadgets/confluence-page-gadget.xml
         "/> 
           <property name="type" value="REQUEST"/>   
           <property name="callback" value="http://jira.example.com/plugins/servlet/gadgets/oauth-callback"/> 
           <property name="consumerKey" value="jira:2f5ae229-2135-43a8-bfd4-77f1cd14b3cc"/>  
           <property name="timeToLive" value="600000"/>  
         </properties>
        
      5. Load the JIRA Dashboard and Gadget several times and observe there are now several such entries in the BANDANA table.
      6. Leave Confluence running for 8 hours or more to ensure Service Provider Session Remover runs. It will not and you will be able to see evidence on logs that it has failed to execute. You should have an entry like the one below:
        runJob Scheduled job CompatibilityPluginScheduler.JobRunnerKey.Service Provider Session Remover#CompatibilityPluginScheduler.JobId.Service Provider Session Remover completed unsuccessfully with response JobRunnerResponse[runOutcome=ABORTED,message='Job has unexpected run mode RUN_LOCALLY'].
        

      Expected Behavior

      The Service Provider Session Remover runs every 8 hours and clears out the OAuth tokens from BANDANA

      Actual Behavior

      The Service Provider Session Remover attempts to run every 8 hours but is aborted. The scheduler_run_details table shows:

         id    |                               job_id                                |       start_time        | duration | outcome |                 message                 
      ---------+---------------------------------------------------------------------+-------------------------+----------+---------+-----------------------------------------
        428282 | CompatibilityPluginScheduler.JobId.Service Provider Session Remover | 2018-03-27 02:56:11.413 |        6 | A       | Job has unexpected run mode RUN_LOCALLY
      

      Technical details

      The above means that if the dashboard is loaded 1000 times, there will be 1000 such entries in the database. These tokens are supposed to be cleared from BANDANA via the scheduled job Service Provider Session Remover which runs every 8 hours, however the job is currently broken.

      As expired/unused tokens are not removed, instances will eventually hold a huge number of tokens in their database. Due to the logic used in the OAuth Plugin and documented in OAUTH-321, as the number of tokens in the database grows, so does the complexity of the problem it needs to resolve. This is because when inserting a new token to the database, it will iterate through a list of existing tokens to validate each one of them before inserting a new one.

      The intent of this logic is to validate duplicated entries as well as keep the list of usable tokens up to date, however, as the set of data ( n ) that the algorithm needs to resolve gets bigger, so does the complexity of the problem, which demands more computational resources to get resolved, making the algorithm an O (n²) complexity class one, leading to eventual outages depending on the instance usage and available resources.

      Symptoms

      The Confluence instances that run into the performance/stability problems due to high number of stored OAuth tokens will observe one or more of the following symptoms:

      • High CPU Usage.
      • High Memory Usage.
      • A huge number of entries in BANDANA table related to Service Provider tokens. This can be confirmed by running below query:
         select count (bandanaid) from bandana where BANDANAKEY like '%com.atlassian.oauth.serviceprovider.ServiceProviderTokenStore.token%';
      • Reports of long-running/stuck threads in Tomcat and Confluence logs for /plugins/servlet/oauth/request-token, e.g.:
        14-Mar-2018 00:40:18.482 WARNING [ContainerBackgroundProcessor[StandardEngine[Standalone]]] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread "http-nio-8101-exec-12" (id=232) has been active for 69,763 milliseconds (since 3/14/18 12:39 AM) to serve the same request for https://confluence.example.com/plugins/servlet/oauth/request-token and may be stuck (configured threshold for this StuckThreadDetectionValve is 60 seconds). There is/are 9 thread(s) in total that are monitored by this Valve and may be stuck.
        
      • Long-running/stuck threads in thread dumps that mention com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore.put in the stack trace. Examples:
        "http-nio-8443-exec-52" #320 daemon prio=5 os_prio=0 tid=0x00007f027c12a800 nid=0x14f9 runnable [0x00007f0235fb5000]
           java.lang.Thread.State: RUNNABLE
        	at org.hibernate.bytecode.internal.javassist.BulkAccessor.getPropertyValues(BulkAccessor.java:58)
        	at org.hibernate.bytecode.internal.javassist.AccessOptimizerAdapter.getPropertyValues(AccessOptimizerAdapter.java:55)
        	at org.hibernate.tuple.entity.PojoEntityTuplizer.getPropertyValuesWithOptimizer(PojoEntityTuplizer.java:234)
        	at org.hibernate.tuple.entity.PojoEntityTuplizer.getPropertyValues(PojoEntityTuplizer.java:212)
        ...
        	at com.atlassian.confluence.setup.bandana.ConfluenceCachingBandanaPersister.retrieve(ConfluenceCachingBandanaPersister.java:78)
        	at com.atlassian.confluence.setup.bandana.ConfluenceCachingBandanaPersister.retrieve(ConfluenceCachingBandanaPersister.java:71)
        	at com.atlassian.bandana.DefaultBandanaManager.getValue(DefaultBandanaManager.java:32)
        	at com.atlassian.bandana.DefaultBandanaManager.getValue(DefaultBandanaManager.java:24)
        ...
        	at com.atlassian.sal.confluence.pluginsettings.ConfluencePluginSettings.get(ConfluencePluginSettings.java:67)
        	at com.atlassian.oauth.shared.sal.HashingLongPropertyKeysPluginSettings.get(HashingLongPropertyKeysPluginSettings.java:25)
        	at com.atlassian.oauth.shared.sal.PrefixingPluginSettings.get(PrefixingPluginSettings.java:17)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings$IsValidTokenReference.apply(PluginSettingsServiceProviderTokenStore.java:417)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings$IsValidTokenReference.apply(PluginSettingsServiceProviderTokenStore.java:409)
        ...
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.getTokenKeySet(PluginSettingsServiceProviderTokenStore.java:383)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.getTokenKeys(PluginSettingsServiceProviderTokenStore.java:320)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.addTokenKey(PluginSettingsServiceProviderTokenStore.java:328)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.put(PluginSettingsServiceProviderTokenStore.java:292)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore.put(PluginSettingsServiceProviderTokenStore.java:139)
        ...
        	at com.atlassian.oauth.serviceprovider.internal.DelegatingTokenStoreImpl.put(DelegatingTokenStoreImpl.java:31)
        	at com.atlassian.oauth.serviceprovider.internal.servlet.RequestTokenServlet.doPostInTransaction(RequestTokenServlet.java:109)
        	at com.atlassian.oauth.serviceprovider.internal.servlet.TransactionalServlet$2.serve(TransactionalServlet.java:48)
        	at com.atlassian.oauth.serviceprovider.internal.servlet.TransactionalServlet$3.doInTransaction(TransactionalServlet.java:57)
        	at com.atlassian.oauth.serviceprovider.internal.servlet.TransactionalServlet$3.doInTransaction(TransactionalServlet.java:54)
        ...
        
        "http-nio-8101-exec-297" #9009 daemon prio=5 os_prio=0 tid=0x00007fb9e851a000 nid=0x8f1c runnable [0x00007fb9dc3bb000]
           java.lang.Thread.State: RUNNABLE
        	at java.net.SocketInputStream.socketRead0(Native Method)
        	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        	at java.net.SocketInputStream.read(SocketInputStream.java:170)
        	at java.net.SocketInputStream.read(SocketInputStream.java:141)
        	at oracle.net.ns.Packet.receive(Packet.java:311)
        	at oracle.net.ns.DataPacket.receive(DataPacket.java:105)
        	at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:305)
        	at oracle.net.ns.NetInputStream.read(NetInputStream.java:249)
        	at oracle.net.ns.NetInputStream.read(NetInputStream.java:171)
        	at oracle.net.ns.NetInputStream.read(NetInputStream.java:89)
        	at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
        	at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
        	at oracle.jdbc.driver.T4CMAREngineStream.unmarshalUB1(T4CMAREngineStream.java:429)
        ...
        	- locked <0x000000067f800000> (a oracle.jdbc.driver.T4CConnection)
        	at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeUpdate(OraclePreparedStatementWrapper.java:1361)
        ...
        	at com.atlassian.confluence.setup.bandana.ConfluenceCachingBandanaPersister.lambda$retrieve$0(ConfluenceCachingBandanaPersister.java:80)
        	at com.atlassian.confluence.setup.bandana.ConfluenceCachingBandanaPersister$$Lambda$354/101785611.get(Unknown Source)
        ...
        	at com.atlassian.sal.confluence.pluginsettings.ConfluencePluginSettings.get(ConfluencePluginSettings.java:67)
        	at com.atlassian.oauth.shared.sal.HashingLongPropertyKeysPluginSettings.get(HashingLongPropertyKeysPluginSettings.java:25)
        	at com.atlassian.oauth.shared.sal.PrefixingPluginSettings.get(PrefixingPluginSettings.java:17)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.getTokenKeySet(PluginSettingsServiceProviderTokenStore.java:368)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.getConsumerTokens(PluginSettingsServiceProviderTokenStore.java:352)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.addConsumerToken(PluginSettingsServiceProviderTokenStore.java:360)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore$Settings.put(PluginSettingsServiceProviderTokenStore.java:293)
        	at com.atlassian.oauth.serviceprovider.sal.PluginSettingsServiceProviderTokenStore.put(PluginSettingsServiceProviderTokenStore.java:139)
        ...
        

      Workaround

      As a workaround, we need to clear tokens stored in BANDANA table manually, since the Scheduled Job is not doing it for us. To do that:

      1. Shutdown Confluence.
      2. Connect to your database client.
      3. Take a backup from your Confluence database in case we may need to revert it to its pre-delete statement state.
      4. Execute below query:
        delete from BANDANA where BANDANAKEY like 'com.atlassian.oauth.serviceprovider.ServiceProviderTokenStore.token%' and bandanavalue like '%value="REQUEST"%'
        
      5. Restart the application.

      After this is done, you should no longer observe the symptoms caused by this issue, however, users may be prompted to re-authenticate against applications that were relying on the OAuth tokens previously stored in the database once the clean-up is completed.

      It is important to highlight that this workaround may need to be repeated from time to time until this bug is solved if symptoms re-occur. Instance administrators may want to setup a trigger in their database clients to execute this routine automatically based upon certain circumstances, like for example, BANDANA table going full of records if they intend to automate this task.

      Attachments

        Issue Links

          Activity

            People

              zzhou Zhenhuan Zhou (Inactive)
              mhorlle Marcelo Horlle
              Votes:
              13 Vote for this issue
              Watchers:
              46 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: