Uploaded image for project: 'Bamboo'
  1. Bamboo
  2. BAM-20292

Race condition when a branch is created while a plan is being deleted creates database inconsistencies that break the deletion service

    XMLWordPrintable

Details

    Description

      Issue Summary

      Race condition when a branch is created while a plan is being deleted creates database inconsistencies that break Bamboo's background deletion service.

      Steps to Reproduce

      Use PostgreSQL for reproduction – easy to lock the table in the steps.

      1. Create a plan
      2. Point it to a Bitbucket Server repository. Easier to reproduce with event based branch detection.
      3. Enable automatic branch detection on the plan
      4. Lock the BUILD table:
        @set autocommit off;
        LOCK TABLE BUILD IN ACCESS EXCLUSIVE MODE;
        SELECT count(BUILD_ID) FROM BUILD;
        
      5. Delete the plan:
             curl -k -u user:password \
             -H 'X-Atlassian-Token: no-check' \
             -H 'Accept: application/json' \
             -H 'Content-Type: application/x-www-form-urlencoded' \
             -d 'buildKey='PLAN-KEY'' \
             -X POST 'http://localhost:8085/ajax/deleteChain.action'
        
      6. Wait 5 seconds. Now push a new branch to the Bitbucket repository.
      7. Wait 5 seconds. Unlock the build table:
        commit transaction;
        

      Expected Results

      New branch is not created since plan is being marked for deletion.
      The plan is deleted by the deletion service.

      Actual Results

      New branch is created with MARKED_FOR_DELETION = false. The rest of the plan is marked for deletion.

      The below exception is thrown in the atlassian-bamboo.log file when the deletion service attempts to run:

      2019-02-11 14:36:36,536 ERROR [DelayedDeletionThread] [DeletionServiceHelper] Error occurred while deleting plan or job: BUG-BUG, continue removal of other plans
      2019-02-11 14:36:36,536 ERROR [DelayedDeletionThread] [DeletionServiceHelper] could not execute batch; SQL [delete from BUILD where BUILD_ID=?]; constraint [fk_jdvuiqpnjmsvf3llia8jvvg03]; nested exception is org.hibernate.exception.ConstraintViolationException: could not execute batch
      org.springframework.dao.DataIntegrityViolationException: could not execute batch; SQL [delete from BUILD where BUILD_ID=?]; constraint [fk_jdvuiqpnjmsvf3llia8jvvg03]; nested exception is org.hibernate.exception.ConstraintViolationException: could not execute batch
      	at org.springframework.orm.hibernate5.SessionFactoryUtils.convertHibernateAccessException(SessionFactoryUtils.java:247)
      	at org.springframework.orm.hibernate5.HibernateTransactionManager.convertHibernateAccessException(HibernateTransactionManager.java:785)
      	at org.springframework.orm.hibernate5.HibernateTransactionManager.doCommit(HibernateTransactionManager.java:621)
      	at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:746)
      	at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:714)
      	at sun.reflect.GeneratedMethodAccessor206.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
      	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:206)
      	at com.sun.proxy.$Proxy109.commit(Unknown Source)
      	at org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSupport.java:532)
      	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:304)
      	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:98)
      	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185)
      	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
      	at com.sun.proxy.$Proxy155.deletePlan(Unknown Source)
      	at com.atlassian.bamboo.deletion.DeletionServiceHelper.doDeletePlan(DeletionServiceHelper.java:226)
      	at com.atlassian.bamboo.deletion.DeletionServiceHelper.doDeletePlans(DeletionServiceHelper.java:199)
      	at com.atlassian.bamboo.deletion.DeletionServiceHelper.executeDelayedDeletionInternal(DeletionServiceHelper.java:177)
      	at com.atlassian.bamboo.deletion.DeletionServiceHelper.executeDelayedDeletion(DeletionServiceHelper.java:121)
      	at com.atlassian.bamboo.deletion.DeletionServiceImpl.executeDelayedDeletion(DeletionServiceImpl.java:411)
      ...
      Caused by: java.sql.BatchUpdateException: Batch entry 0 delete from BUILD where BUILD_ID=178389009 was aborted: ERROR: update or delete on table "build" violates foreign key constraint "fk_jdvuiqpnjmsvf3llia8jvvg03" on table "build"
        Detail: Key (build_id)=(178389009) is still referenced from table "build".  Call getNextException to see other errors in the batch.
      

      A number of other background threads (including attempting to load the dashboard) that operate on the plan cache fail with:

      2019-02-11 14:32:34,646 INFO [http-nio-8085-exec-1] [AccessLogFilter] charlie GET http://localhost:8085/allPlans.action 1173723kb
      2019-02-11 14:32:34,662 INFO [http-nio-8085-exec-1] [RetryingTaskExecutor] Task 'Retrieving BUG-BUG4' threw java.lang.NullPointerException: Master chain not found in plan cache, aborting.
      2019-02-11 14:32:34,666 ERROR [http-nio-8085-exec-1] [ExceptionMappingInterceptor] java.lang.NullPointerException: Master chain not found in plan cache
      java.lang.RuntimeException: java.lang.NullPointerException: Master chain not found in plan cache
      	at com.atlassian.bamboo.executor.RetryingTaskExecutor.rerun(RetryingTaskExecutor.java:129)
      	at com.atlassian.bamboo.executor.RetryingTaskExecutor.runTask(RetryingTaskExecutor.java:99)
      	at com.atlassian.bamboo.executor.RetryingTaskExecutor.retry(RetryingTaskExecutor.java:207)
      	at com.atlassian.bamboo.plan.cache.ImmutablePlanCacheServiceImpl$1.load(ImmutablePlanCacheServiceImpl.java:129)
      	at com.atlassian.bamboo.plan.cache.ImmutablePlanCacheServiceImpl$1.load(ImmutablePlanCacheServiceImpl.java:103)
      	at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
      	at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
      	at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
      	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
      	at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
      	at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
      	at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
      	at com.atlassian.bamboo.utils.collections.AlwaysInvalidatingCacheDecorator.get(AlwaysInvalidatingCacheDecorator.java:65)
      	at com.atlassian.bamboo.utils.collections.AlwaysInvalidatingCacheDecorator.getUnchecked(AlwaysInvalidatingCacheDecorator.java:126)
      	at com.atlassian.bamboo.plan.cache.ImmutablePlanCacheServiceImpl.internalGetPlanByKey(ImmutablePlanCacheServiceImpl.java:487)
      	at com.atlassian.bamboo.plan.cache.ImmutablePlanCacheServiceImpl.lambda$internalGetAllChainsStream$11(ImmutablePlanCacheServiceImpl.java:571)
      	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
      	at java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3527)
      	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
      	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
      	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
      	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
      	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
      	at com.atlassian.bamboo.plan.cache.CachedPlanManagerImpl.sortedCopy(CachedPlanManagerImpl.java:293)
      	at com.atlassian.bamboo.plan.cache.CachedPlanManagerImpl.getPlans(CachedPlanManagerImpl.java:104)
      

      The NullPointerException seems to self correct at some point and a restart can also correct it but the deletion service will never recover.

      Affected plans continue to spam this warning into the logs:

      2019-02-11 16:35:05,240 WARN [scheduler_Worker-8] [ImmutablePlanCacheServiceImpl] Plan BUG-BUG2 scheduled for deletion but not hidden
      2019-02-11 16:35:05,240 WARN [scheduler_Worker-8] [ImmutablePlanCacheServiceImpl] Plan BUG-BUG scheduled for deletion but not hidden
      

      Workaround

      Restart Bamboo so that users will be able to access Bamboo again without error but the data inconsistency will still exist in database.

      There are ways to correct the data but it requires analysis.

      Raise a ticket with Atlassian Support referencing this bug and we'll help with the analysis and correction.

      Attachments

        Issue Links

          Activity

            People

              mgardias Marcin Gardias
              jowen@atlassian.com Jeremy Owen
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: