Details
Description
The procedure below requires direct changes into the database being used by Bamboo. Because of that, do not proceed without exporting your Bamboo data first. Own a valid database backup before proceeding.
Summary
Bamboo Server fails to start up due to a corruption on the LIST_POSITION column of CHAIN_STAGE table.
Steps to Reproduce
At this stage, the steps to reproduce this are not clear. It is known that this could be caused by editing a Build Plan (i.e. re-ordering the steps in a build plan) and restarting the instance, for example.
Expected Results
Bamboo shouldn't get into this database corruption state.
Actual Results
Due to the corruption, the below exceptions are thrown in the atlassian-bamboo.log file:
2011-10-12 11:16:16,966 FATAL [main] [BambooContainer] Cannot start bamboo java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.ArrayList.set(ArrayList.java:339) at net.sf.hibernate.collection.List.readFrom(List.java:312) at net.sf.hibernate.loader.Loader.readCollectionElement(Loader.java:384) at net.sf.hibernate.loader.Loader.getRowFromResultSet(Loader.java:240) ...... ...... at $Proxy11.getAllPlans(Unknown Source) at com.atlassian.bamboo.container.BambooContainer.initialisePlans(BambooContainer.java:332) at com.atlassian.bamboo.container.BambooContainer.start(BambooContainer.java:243)
Or:
2014-09-15 19:30:09,928 DEBUG [ActiveMQ Journal Checkpoint Worker] [MessageDatabase] Checkpoint started. 2014-09-15 19:30:09,934 DEBUG [ActiveMQ Journal Checkpoint Worker] [MessageDatabase] Checkpoint done. 2014-09-15 19:30:11,310 INFO [localhost-startStop-1] [RetryingTaskExecutor] Task 'Retrieving TESTONE-SAMPLEONE' threw java.lang.NullPointerException, aborting. 2014-09-15 19:30:11,312 FATAL [localhost-startStop-1] [BambooContainer] Cannot start bamboo java.lang.RuntimeException: java.lang.NullPointerException at com.atlassian.bamboo.executor.RetryingTaskExecutor.rerun(RetryingTaskExecutor.java:120) at com.atlassian.bamboo.executor.RetryingTaskExecutor.runTask(RetryingTaskExecutor.java:88) at com.atlassian.bamboo.executor.RetryingTaskExecutor.retry(RetryingTaskExecutor.java:191) at com.atlassian.bamboo.plan.cache.ImmutablePlanManagerImpl.getPlanByKey(ImmutablePlanManagerImpl.java:129) at sun.reflect.GeneratedMethodAccessor215.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) .... at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.NullPointerException at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187) at com.atlassian.bamboo.deletion.NotDeletedPredicate.apply(NotDeletedPredicate.java:28) ....
Workaround
Identifying the problem:
You should run the following query in order to find out the duplicates:
select stage.* from chain_stage as stage join build as chain ON stage.build_id = chain.build_id where chain.full_key = 'PLAN-KEY-0' order by stage.list_position
As an example, let's look at a bad result for the query above:
STAGE_ID | NAME | BUILD_ID | LIST_POSITION |
---|---|---|---|
8389098 | Stage 2 | 4063315 | 1 |
8389099 | Stage 2 | 4063368 | 1 |
8389100 | Stage 2 | 8422478 | 1 |
8389101 | Stage 3 | 4063315 | 1 |
8389102 | Stage 3 | 4063368 | 1 |
8389103 | Stage 3 | 8422478 | 1 |
From this result, you can see that the 3 BUILD_ID: 4063315, 4063368 and 8422478 are duplicated as they appear twice on the table above.
That means that each one of those BUILD_ID need to be fixed on the CHAIN_STAGE table.
If this query had returned zero results, the corruption we are trying to fix wouldn't exist.
How to fix the problem:
If you run the following query specifying one of the duplicate BUILD_ID's reported above:
SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=8422478 ORDER BY BUILD_ID, LIST_POSITION;
You will get the following result which evidences the corruption on the CHAIN_STAGE table:
STAGE_ID | CREATED_DATE | UPDATED_DATE | NAME | DESCRIPTION | OPTIONAL_STAGE | MARKED_FOR_DELETION | BUILD_ID | LIST_POSITION |
---|---|---|---|---|---|---|---|---|
8389103 | (null) | (null) | Stage 3 | false | false | 8422478 | 1 | |
8389100 | 2015-10-24 13:00:38 | 2015-10-24 13:00:38 | Stage 2 | false | true | 8422478 | 1 | |
8389106 | (null) | (null) | Stage 4 | false | false | 8422478 | 2 |
Therefore, in order to identify what needs to be fixed, 3 separate queries would have to be run against the db so you can see the corruption for each BUILD_ID. In this specific example, they are:
SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=4063315 ORDER BY BUILD_ID, LIST_POSITION; SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=4063368 ORDER BY BUILD_ID, LIST_POSITION; SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=8422478 ORDER BY BUILD_ID, LIST_POSITION;
But wait... Where is the corruption on the table above?
Notice from the result above that for each row of CHAIN_STAGE the LIST_POSITION for the BUILD_ID=8422478 holds the values 1, 1 and 2 when it should be holding the values 0, 1 and 2. The values should always start off from 0, otherwise Bamboo won't start.
That's what the corruption is about.
How to fix it?
In order to fix this, you will need to analyse the corruption for each one of the queries above and manually fix the problematic rows.
- If you're running Bamboo with an external db ...
Then just update the rows manually. As mentioned before, the result should have been 0, 1 and 2 and not 1, 1 and 2.
So, in order to fix the problem in this specific corrupt output, you would need to update the first row only by running the following query:UPDATE CHAIN_STAGE SET LIST_POSITION = 0 WHERE STAGE_ID = '8389103'
In our example, after analysing the output of the SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=... ORDER BY BUILD_ID, LIST_POSITION;, the final set of queries to fix this is:
UPDATE CHAIN_STAGE SET LIST_POSITION = 0 WHERE STAGE_ID = '8389103' UPDATE CHAIN_STAGE SET LIST_POSITION = 0 WHERE STAGE_ID = '8389101' UPDATE CHAIN_STAGE SET LIST_POSITION = 0 WHERE STAGE_ID = '8389102'
- If you're running Bamboo with an HSQLDB ...
HSQLDB is the in-memory database for evaluation purposes only. If you're using this in production, well, you should really have migrated to an external database by now because we don't recommend production environments to be using HSQLDB.- How to detect issue and fix it:
If you're in this scenario, though, here is how you can fix it:- Stop your Bamboo instance process.
- In order to run the queries to identify the corruption, you will need to Connect to your HSQLDB using DBVisualizer and run the SELECT queries as described under the Identifying the problem section.
- If your database is big (consider big ~200MB), you might consider increasing the memory allocated for your DBVisualizer to 2048m
- Make sure Bamboo is not running. This is very important, otherwise the values we update on the next step will be overwritten and void the fix.
- Edit the <BAMBOO_HOME>/database/defaultdb.script. In there, you will find queries that populate the HSQLDB when Bamboo is started up. For the example we fixed here, you would have found:
INSERT INTO CHAIN_STAGE VALUES(8389101,NULL,NULL,'Stage 3','',FALSE,FALSE,4063315,1) INSERT INTO CHAIN_STAGE VALUES(8389102,NULL,NULL,'Stage 3','',FALSE,FALSE,4063368,1) INSERT INTO CHAIN_STAGE VALUES(8389103,NULL,NULL,'Stage 3','',FALSE,FALSE,8422478,1)
- You'd have to change that the INSERT queries above into the ones below so that they can start off at 0:
INSERT INTO CHAIN_STAGE VALUES(8389101,NULL,NULL,'Stage 3','',FALSE,FALSE,4063315,0) INSERT INTO CHAIN_STAGE VALUES(8389102,NULL,NULL,'Stage 3','',FALSE,FALSE,4063368,0) INSERT INTO CHAIN_STAGE VALUES(8389103,NULL,NULL,'Stage 3','',FALSE,FALSE,8422478,0)
- How to detect issue and fix it:
Attachments
Issue Links
- relates to
-
BAM-14518 DeletionServiceImpl fails to delete stage due to NPE
- Closed
-
BAM-12217 Corrupted stage order after reordering a job
- Closed
-
BDEV-5446 Loading...
-
BDEV-9928 Loading...
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...