Bamboo Server fails to start up due to a corruption on the LIST_POSITION column of CHAIN_STAGE table.
At this stage, the steps to reproduce this are not clear. It is known that this could be caused by editing a Build Plan (i.e. re-ordering the steps in a build plan) and restarting the instance, for example.
Bamboo shouldn't get into this database corruption state.
Due to the corruption, the below exceptions are thrown in the atlassian-bamboo.log file:
Identifying the problem:
You should run the following query in order to find out the duplicates:
As an example, let's look at a bad result for the query above:
From this result, you can see that the 3 BUILD_ID: 4063315, 4063368 and 8422478 are duplicated as they appear twice on the table above.
That means that each one of those BUILD_ID need to be fixed on the CHAIN_STAGE table.
If this query had returned zero results, the corruption we are trying to fix wouldn't exist.
How to fix the problem:
If you run the following query specifying one of the duplicate BUILD_ID's reported above:
You will get the following result which evidences the corruption on the CHAIN_STAGE table:
|8389100||2015-10-24 13:00:38||2015-10-24 13:00:38||Stage 2||false||true||8422478||1|
Therefore, in order to identify what needs to be fixed, 3 separate queries would have to be run against the db so you can see the corruption for each BUILD_ID. In this specific example, they are:
But wait... Where is the corruption on the table above?
Notice from the result above that for each row of CHAIN_STAGE the LIST_POSITION for the BUILD_ID=8422478 holds the values 1, 1 and 2 when it should be holding the values 0, 1 and 2. The values should always start off from 0, otherwise Bamboo won't start.
That's what the corruption is about.
How to fix it?
In order to fix this, you will need to analyse the corruption for each one of the queries above and manually fix the problematic rows.
- If you're running Bamboo with an external db ...
Then just update the rows manually. As mentioned before, the result should have been 0, 1 and 2 and not 1, 1 and 2.
So, in order to fix the problem in this specific corrupt output, you would need to update the first row only by running the following query:
In our example, after analysing the output of the SELECT * FROM CHAIN_STAGE WHERE BUILD_ID=... ORDER BY BUILD_ID, LIST_POSITION;, the final set of queries to fix this is:
- If you're running Bamboo with an HSQLDB ...
HSQLDB is the in-memory database for evaluation purposes only. If you're using this in production, well, you should really have migrated to an external database by now because we don't recommend production environments to be using HSQLDB.
- How to detect issue and fix it:
If you're in this scenario, though, here is how you can fix it:
- Stop your Bamboo instance process.
- In order to run the queries to identify the corruption, you will need to Connect to your HSQLDB using DBVisualizer and run the SELECT queries as described under the Identifying the problem section.
- If your database is big (consider big ~200MB), you might consider increasing the memory allocated for your DBVisualizer to 2048m
- Make sure Bamboo is not running. This is very important, otherwise the values we update on the next step will be overwritten and void the fix.
- Edit the <BAMBOO_HOME>/database/defaultdb.script. In there, you will find queries that populate the HSQLDB when Bamboo is started up. For the example we fixed here, you would have found:
- You'd have to change that the INSERT queries above into the ones below so that they can start off at 0:
- How to detect issue and fix it: