Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: High
Fix Version/s: None
Affects Version/s: 5.2, 5.6.1, 5.9.4, 5.9.7
Component/s: Jobs, Stages, Tasks
Labels:
- buildeng
- support

The procedure below requires direct changes into the database being used by Bamboo. Because of that, do not proceed without exporting your Bamboo data first. Own a valid database backup before proceeding.

Summary

Bamboo Server fails to start up due to a corruption on the LIST_POSITION column of CHAIN_STAGE table.

Steps to Reproduce

At this stage, the steps to reproduce this are not clear. It is known that this could be caused by editing a Build Plan (i.e. re-ordering the steps in a build plan) and restarting the instance, for example.

Expected Results

Bamboo shouldn't get into this database corruption state.

Actual Results

Due to the corruption, the below exceptions are thrown in the atlassian-bamboo.log file:

2011-10-12 11:16:16,966 FATAL [main] [BambooContainer] Cannot start bamboo
java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.set(ArrayList.java:339)
at net.sf.hibernate.collection.List.readFrom(List.java:312)
at net.sf.hibernate.loader.Loader.readCollectionElement(Loader.java:384)
at net.sf.hibernate.loader.Loader.getRowFromResultSet(Loader.java:240)
......
......
at $Proxy11.getAllPlans(Unknown Source)
at com.atlassian.bamboo.container.BambooContainer.initialisePlans(BambooContainer.java:332)
at com.atlassian.bamboo.container.BambooContainer.start(BambooContainer.java:243)

Or:

2014-09-15 19:30:09,928 DEBUG [ActiveMQ Journal Checkpoint Worker] [MessageDatabase] Checkpoint started.
2014-09-15 19:30:09,934 DEBUG [ActiveMQ Journal Checkpoint Worker] [MessageDatabase] Checkpoint done.
2014-09-15 19:30:11,310 INFO [localhost-startStop-1] [RetryingTaskExecutor] Task 'Retrieving TESTONE-SAMPLEONE' threw java.lang.NullPointerException, aborting.
2014-09-15 19:30:11,312 FATAL [localhost-startStop-1] [BambooContainer] Cannot start bamboo
java.lang.RuntimeException: java.lang.NullPointerException
	at com.atlassian.bamboo.executor.RetryingTaskExecutor.rerun(RetryingTaskExecutor.java:120)
	at com.atlassian.bamboo.executor.RetryingTaskExecutor.runTask(RetryingTaskExecutor.java:88)
	at com.atlassian.bamboo.executor.RetryingTaskExecutor.retry(RetryingTaskExecutor.java:191)
	at com.atlassian.bamboo.plan.cache.ImmutablePlanManagerImpl.getPlanByKey(ImmutablePlanManagerImpl.java:129)
	at sun.reflect.GeneratedMethodAccessor215.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
....
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.NullPointerException
	at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
	at com.atlassian.bamboo.deletion.NotDeletedPredicate.apply(NotDeletedPredicate.java:28)
....

Workaround

Identifying the problem:
You should run the following query in order to find out the duplicates:

select stage.*
  from chain_stage as stage
  join build as chain ON stage.build_id = chain.build_id
 where chain.full_key = 'PLAN-KEY-0'
 order by stage.list_position

As an example, let's look at a bad result for the query above:

STAGE_ID	NAME	BUILD_ID	LIST_POSITION
8389098	Stage 2	4063315	1
8389099	Stage 2	4063368	1
8389100	Stage 2	8422478	1
8389101	Stage 3	4063315	1
8389102	Stage 3	4063368	1
8389103	Stage 3	8422478	1

From this result, you can see that the 3 BUILD_ID: 4063315, 4063368 and 8422478 are duplicated as they appear twice on the table above.

That means that each one of those BUILD_ID need to be fixed on the CHAIN_STAGE table.

If this query had returned zero results, the corruption we are trying to fix wouldn't exist.

How to fix the problem:
If you run the following query specifying one of the duplicate BUILD_ID's reported above:

SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=8422478 ORDER BY BUILD_ID, LIST_POSITION;

You will get the following result which evidences the corruption on the CHAIN_STAGE table:

STAGE_ID	CREATED_DATE	UPDATED_DATE	NAME	OPTIONAL_STAGE	MARKED_FOR_DELETION	BUILD_ID	LIST_POSITION
8389103	(null)	(null)	Stage 3	false	false	8422478	1
8389100	2015-10-24 13:00:38	2015-10-24 13:00:38	Stage 2	false	true	8422478	1
8389106	(null)	(null)	Stage 4	false	false	8422478	2

Therefore, in order to identify what needs to be fixed, 3 separate queries would have to be run against the db so you can see the corruption for each BUILD_ID. In this specific example, they are:

SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=4063315 ORDER BY BUILD_ID, LIST_POSITION;
SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=4063368 ORDER BY BUILD_ID, LIST_POSITION;
SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=8422478 ORDER BY BUILD_ID, LIST_POSITION;

But wait... Where is the corruption on the table above?
Notice from the result above that for each row of CHAIN_STAGE the LIST_POSITION for the BUILD_ID=8422478 holds the values 1, 1 and 2 when it should be holding the values 0, 1 and 2. The values should always start off from 0, otherwise Bamboo won't start.

That's what the corruption is about.

How to fix it?
In order to fix this, you will need to analyse the corruption for each one of the queries above and manually fix the problematic rows.

If you're running Bamboo with an external db ...
Then just update the rows manually. As mentioned before, the result should have been 0, 1 and 2 and not 1, 1 and 2.
So, in order to fix the problem in this specific corrupt output, you would need to update the first row only by running the following query:
```
UPDATE CHAIN_STAGE SET LIST_POSITION = 0 WHERE STAGE_ID = '8389103'
```
In our example, after analysing the output of the SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=... ORDER BY BUILD_ID, LIST_POSITION;, the final set of queries to fix this is:
```
UPDATE CHAIN_STAGE SET LIST_POSITION = 0 WHERE STAGE_ID = '8389103'
UPDATE CHAIN_STAGE SET LIST_POSITION = 0 WHERE STAGE_ID = '8389101'
UPDATE CHAIN_STAGE SET LIST_POSITION = 0 WHERE STAGE_ID = '8389102'
```

If you're running Bamboo with an HSQLDB ...
HSQLDB is the in-memory database for evaluation purposes only. If you're using this in production, well, you should really have migrated to an external database by now because we don't recommend production environments to be using HSQLDB.
- How to detect issue and fix it:
  If you're in this scenario, though, here is how you can fix it:
  - Stop your Bamboo instance process.
  - In order to run the queries to identify the corruption, you will need to Connect to your HSQLDB using DBVisualizer and run the SELECT queries as described under the Identifying the problem section.
    - If your database is big (consider big ~200MB), you might consider increasing the memory allocated for your DBVisualizer to 2048m
  - Make sure Bamboo is not running. This is very important, otherwise the values we update on the next step will be overwritten and void the fix.
  - Edit the <BAMBOO_HOME>/database/defaultdb.script. In there, you will find queries that populate the HSQLDB when Bamboo is started up. For the example we fixed here, you would have found:
    INSERT INTO CHAIN_STAGE VALUES(8389101,NULL,NULL,'Stage 3','',FALSE,FALSE,4063315,1) INSERT INTO CHAIN_STAGE VALUES(8389102,NULL,NULL,'Stage 3','',FALSE,FALSE,4063368,1) INSERT INTO CHAIN_STAGE VALUES(8389103,NULL,NULL,'Stage 3','',FALSE,FALSE,8422478,1)
  - You'd have to change that the INSERT queries above into the ones below so that they can start off at 0:
    INSERT INTO CHAIN_STAGE VALUES(8389101,NULL,NULL,'Stage 3','',FALSE,FALSE,4063315,0) INSERT INTO CHAIN_STAGE VALUES(8389102,NULL,NULL,'Stage 3','',FALSE,FALSE,4063368,0) INSERT INTO CHAIN_STAGE VALUES(8389103,NULL,NULL,'Stage 3','',FALSE,FALSE,8422478,0)

relates to

BAM-14518 DeletionServiceImpl fails to delete stage due to NPE

Closed

BAM-12217 Corrupted stage order after reordering a job

Closed

BDEV-5446 Loading...

BDEV-9928 Loading...

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(68 mentioned in)

Assignee:: Unassigned
Reporter:: Carlen Benard (Inactive)
Votes:: 0 Vote for this issue
Watchers:: 14 Start watching this issue

Created:: 15/Sep/2014 4:13 PM
Updated:: 06/Mar/2024 6:50 AM
Resolved:: 21/Apr/2016 6:55 PM

Details

Description

Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround

Attachments

Issue Links

Activity

People

Dates