The addition of the index c_status_idx on the column content_status in the Table CONTENT in Confluence 5.8 and 5.9 causes content index administration (rebuilding the Content Index) to be come dramatically slower than previous indexing. Clients experiencing this issue will see as much as a 10x slower speed in overall content index administration. This does not now appear to be solely related to Oracle, thus the need to update this issue.
When Confluence is running on any database, query speed from the above SQL query can return up to 10x slower when using this table index, which causes overall Confluence search index rebuild speed to become dramatically slower. In one observed Confluence instance with 8 million pieces of content (1.5 million of which is current), the search index rebuild time went from 4 hours in Confluence 5.7 to 30+ hours in 5.8 when running Oracle. In an instance running Microsoft SQL Server with Confluence 5.9.x, the indexing went from 3 to 4 hours up to 9 to 12 hours. In another instance running PostgreSQL, the indexing went from 3 hours to 8 to 10 hours. Similar reports have been made about MySQL, and given the facts of the index's low cardinality (only two possible values exist: current or deleted), it would make sense to remove this index to improve performance.
Confirmed in Microsoft SQL Server and Oracle databases. Not confirmed yet for PostgreSQL, not confirmed in MySQL, though there are numerous slow indexing issues in MySQL.
Any Confluence instance with content at scale is likely to experience this issue on Confluence 5.8 or 5.9. An instance with more than 250000 pages should be sufficient to expose the issue
- Build an instance at scale with 250,000 pages and deploy on Confluence 5.7.4 or greater
- Rebuild the Content Index using the UI
- Observe the time to index
- Upgrade the instance to Confluence 5.9
- Rebuild the Content Index using the UI
- Observe the much slower time to index the content
I would expect no difference to indexing performance at a minimum, and with the improved performance of the application, I would expect performance to include improvements in indexing speed.
The indexing is actually much slower, causing much frustration on the part of clients, as the index rebuild is far slower than it was before the upgrade to Confluence.
In Confluence 5.8, a table index was added onto the content_status column of the CONTENT table called c_status_idx. This index is used in the following SQL query frequently when Confluence's search index is being rebuilt:
Particularly, it is due to this section of the query, which was not present during index rebuild for the 5.7 or prior Confluence versions:
We know that the table index is at play, because if we give a NO_INDEX hint to Oracle while running the above query, which explicitly asks the DB not to use the index on c_status_idx, the results return very swiftly, e.g.:
The column "content_status" is responsible for tracking whether a page is current, or deleted and sitting in a Space's trash. There are only two possible values for this column: "current" or "deleted". Due to the low cardinality, and that the vast majority of the values are "current", it may not make sense that we have a b-tree table index on this column (especially if it can negatively impact performance).
PLEASE READ: It is recommended that you follow best practices for Change Management and will test and validate these settings in a Test/Development and Staging environment prior to rolling any changes into a Production environment. You must test and validate these changes to insure that they will function will within your infrastructure prior to placing these changes in production.
If you suspect that you are impacted by this issue, you should test and evaluate dropping the index on the content_status column of the CONTENT table in a Test or development instance, then test the search index rebuild speed. You may need to restart your instance before you retest the Content Index Rebuild to verify the performance improvement.