Uploaded image for project: 'Bamboo'
  1. Bamboo
  2. BAM-15026

LIST_POSITION column of CHAIN_STAGE table gets corrupted causing Bamboo startup to fail

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: High
    • Resolution: Duplicate
    • Affects Version/s: 5.2, 5.6.1, 5.9.4, 5.9.7
    • Fix Version/s: None
    • Component/s: Jobs, Stages, Tasks
    • Labels:
    • Last commented by user?:
      true
    • Comments:
      3
    • Regular Expression:
      threw java.lang.NullPointerException\, aborting.

      Description

      The procedure below requires direct changes into the database being used by Bamboo. Because of that, do not proceed without exporting your Bamboo data first. Own a valid database backup before proceeding.

      Summary

      Bamboo Server fails to start up due to a corruption on the LIST_POSITION column of CHAIN_STAGE table.

      Steps to Reproduce

      At this stage, the steps to reproduce this are not clear. It is known that this could be caused by editing a Build Plan (i.e. re-ordering the steps in a build plan) and restarting the instance, for example.

      Expected Results

      Bamboo shouldn't get into this database corruption state.

      Actual Results

      Due to the corruption, the below exceptions are thrown in the atlassian-bamboo.log file:

      2011-10-12 11:16:16,966 FATAL [main] [BambooContainer] Cannot start bamboo
      java.lang.ArrayIndexOutOfBoundsException: -1
      at java.util.ArrayList.set(ArrayList.java:339)
      at net.sf.hibernate.collection.List.readFrom(List.java:312)
      at net.sf.hibernate.loader.Loader.readCollectionElement(Loader.java:384)
      at net.sf.hibernate.loader.Loader.getRowFromResultSet(Loader.java:240)
      ......
      ......
      at $Proxy11.getAllPlans(Unknown Source)
      at com.atlassian.bamboo.container.BambooContainer.initialisePlans(BambooContainer.java:332)
      at com.atlassian.bamboo.container.BambooContainer.start(BambooContainer.java:243) 
      

      Or:

      2014-09-15 19:30:09,928 DEBUG [ActiveMQ Journal Checkpoint Worker] [MessageDatabase] Checkpoint started.
      2014-09-15 19:30:09,934 DEBUG [ActiveMQ Journal Checkpoint Worker] [MessageDatabase] Checkpoint done.
      2014-09-15 19:30:11,310 INFO [localhost-startStop-1] [RetryingTaskExecutor] Task 'Retrieving TESTONE-SAMPLEONE' threw java.lang.NullPointerException, aborting.
      2014-09-15 19:30:11,312 FATAL [localhost-startStop-1] [BambooContainer] Cannot start bamboo
      java.lang.RuntimeException: java.lang.NullPointerException
      	at com.atlassian.bamboo.executor.RetryingTaskExecutor.rerun(RetryingTaskExecutor.java:120)
      	at com.atlassian.bamboo.executor.RetryingTaskExecutor.runTask(RetryingTaskExecutor.java:88)
      	at com.atlassian.bamboo.executor.RetryingTaskExecutor.retry(RetryingTaskExecutor.java:191)
      	at com.atlassian.bamboo.plan.cache.ImmutablePlanManagerImpl.getPlanByKey(ImmutablePlanManagerImpl.java:129)
      	at sun.reflect.GeneratedMethodAccessor215.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      ....
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      Caused by: java.lang.NullPointerException
      	at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
      	at com.atlassian.bamboo.deletion.NotDeletedPredicate.apply(NotDeletedPredicate.java:28)
      ....
      

      Workaround

      Identifying the problem:
      You should run the following query in order to find out the duplicates:

      select stage.*
        from chain_stage as stage
        join build as chain ON stage.build_id = chain.build_id
       where chain.full_key = 'PLAN-KEY-0'
       order by stage.list_position
      

      As an example, let's look at a bad result for the query above:

      STAGE_ID NAME BUILD_ID LIST_POSITION
      8389098 Stage 2 4063315 1
      8389099 Stage 2 4063368 1
      8389100 Stage 2 8422478 1
      8389101 Stage 3 4063315 1
      8389102 Stage 3 4063368 1
      8389103 Stage 3 8422478 1

      From this result, you can see that the 3 BUILD_ID: 4063315, 4063368 and 8422478 are duplicated as they appear twice on the table above.

      That means that each one of those BUILD_ID need to be fixed on the CHAIN_STAGE table.

      If this query had returned zero results, the corruption we are trying to fix wouldn't exist.

      How to fix the problem:
      If you run the following query specifying one of the duplicate BUILD_ID's reported above:

      SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=8422478 ORDER BY BUILD_ID, LIST_POSITION;
      

      You will get the following result which evidences the corruption on the CHAIN_STAGE table:

      STAGE_ID CREATED_DATE UPDATED_DATE NAME DESCRIPTION OPTIONAL_STAGE MARKED_FOR_DELETION BUILD_ID LIST_POSITION
      8389103 (null) (null) Stage 3   false false 8422478 1
      8389100 2015-10-24 13:00:38 2015-10-24 13:00:38 Stage 2   false true 8422478 1
      8389106 (null) (null) Stage 4   false false 8422478 2

      Therefore, in order to identify what needs to be fixed, 3 separate queries would have to be run against the db so you can see the corruption for each BUILD_ID. In this specific example, they are:

      SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=4063315 ORDER BY BUILD_ID, LIST_POSITION;
      SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=4063368 ORDER BY BUILD_ID, LIST_POSITION;
      SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=8422478 ORDER BY BUILD_ID, LIST_POSITION;
      

      But wait... Where is the corruption on the table above?
      Notice from the result above that for each row of CHAIN_STAGE the LIST_POSITION for the BUILD_ID=8422478 holds the values 1, 1 and 2 when it should be holding the values 0, 1 and 2. The values should always start off from 0, otherwise Bamboo won't start.

      That's what the corruption is about.

      How to fix it?
      In order to fix this, you will need to analyse the corruption for each one of the queries above and manually fix the problematic rows.

      • If you're running Bamboo with an external db ...
        Then just update the rows manually. As mentioned before, the result should have been 0, 1 and 2 and not 1, 1 and 2.
        So, in order to fix the problem in this specific corrupt output, you would need to update the first row only by running the following query:
        UPDATE CHAIN_STAGE SET LIST_POSITION = 0 WHERE STAGE_ID = '8389103'
        

        In our example, after analysing the output of the SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=... ORDER BY BUILD_ID, LIST_POSITION;, the final set of queries to fix this is:

        UPDATE CHAIN_STAGE SET LIST_POSITION = 0 WHERE STAGE_ID = '8389103'
        UPDATE CHAIN_STAGE SET LIST_POSITION = 0 WHERE STAGE_ID = '8389101'
        UPDATE CHAIN_STAGE SET LIST_POSITION = 0 WHERE STAGE_ID = '8389102'
        
      • If you're running Bamboo with an HSQLDB ...
        HSQLDB is the in-memory database for evaluation purposes only. If you're using this in production, well, you should really have migrated to an external database by now because we don't recommend production environments to be using HSQLDB.
        • How to detect issue and fix it:
          If you're in this scenario, though, here is how you can fix it:
          • Stop your Bamboo instance process.
          • In order to run the queries to identify the corruption, you will need to Connect to your HSQLDB using DBVisualizer and run the SELECT queries as described under the Identifying the problem section.
          • Make sure Bamboo is not running. This is very important, otherwise the values we update on the next step will be overwritten and void the fix.
          • Edit the <BAMBOO_HOME>/database/defaultdb.script. In there, you will find queries that populate the HSQLDB when Bamboo is started up. For the example we fixed here, you would have found:
            INSERT INTO CHAIN_STAGE VALUES(8389101,NULL,NULL,'Stage 3','',FALSE,FALSE,4063315,1)
            INSERT INTO CHAIN_STAGE VALUES(8389102,NULL,NULL,'Stage 3','',FALSE,FALSE,4063368,1)
            INSERT INTO CHAIN_STAGE VALUES(8389103,NULL,NULL,'Stage 3','',FALSE,FALSE,8422478,1)
            
          • You'd have to change that the INSERT queries above into the ones below so that they can start off at 0:
            INSERT INTO CHAIN_STAGE VALUES(8389101,NULL,NULL,'Stage 3','',FALSE,FALSE,4063315,0)
            INSERT INTO CHAIN_STAGE VALUES(8389102,NULL,NULL,'Stage 3','',FALSE,FALSE,4063368,0)
            INSERT INTO CHAIN_STAGE VALUES(8389103,NULL,NULL,'Stage 3','',FALSE,FALSE,8422478,0)
            

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Last commented:
                  1 year, 1 week, 3 days ago