Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-41010

Table index "c_status_idx" on CONTENT.content_status can cause slow search index rebuilding

    XMLWordPrintable

Details

    Description

      Summary

      The addition of the index c_status_idx on the column content_status in the Table CONTENT in Confluence 5.8 and 5.9 causes content index administration (rebuilding the Content Index) to be come dramatically slower than previous indexing. Clients experiencing this issue will see as much as a 10x slower speed in overall content index administration. This does not now appear to be solely related to Oracle, thus the need to update this issue.

      When Confluence is running on any database, query speed from the above SQL query can return up to 10x slower when using this table index, which causes overall Confluence search index rebuild speed to become dramatically slower. In one observed Confluence instance with 8 million pieces of content (1.5 million of which is current), the search index rebuild time went from 4 hours in Confluence 5.7 to 30+ hours in 5.8 when running Oracle. In an instance running Microsoft SQL Server with Confluence 5.9.x, the indexing went from 3 to 4 hours up to 9 to 12 hours. In another instance running PostgreSQL, the indexing went from 3 hours to 8 to 10 hours. Similar reports have been made about MySQL, and given the facts of the index's low cardinality (only two possible values exist: current or deleted), it would make sense to remove this index to improve performance.

      Environments

      Confirmed in Microsoft SQL Server and Oracle databases. Not confirmed yet for PostgreSQL, not confirmed in MySQL, though there are numerous slow indexing issues in MySQL.

      Steps to Reproduce

      Any Confluence instance with content at scale is likely to experience this issue on Confluence 5.8 or 5.9. An instance with more than 250000 pages should be sufficient to expose the issue

      1. Build an instance at scale with 250,000 pages and deploy on Confluence 5.7.4 or greater
      2. Rebuild the Content Index using the UI
      3. Observe the time to index
      4. Upgrade the instance to Confluence 5.9
      5. Rebuild the Content Index using the UI
      6. Observe the much slower time to index the content

      Expected Results

      I would expect no difference to indexing performance at a minimum, and with the improved performance of the application, I would expect performance to include improvements in indexing speed.

      Actual Results

      The indexing is actually much slower, causing much frustration on the part of clients, as the index rebuild is far slower than it was before the upgrade to Confluence.

      Notes

      In Confluence 5.8, a table index was added onto the content_status column of the CONTENT table called c_status_idx. This index is used in the following SQL query frequently when Confluence's search index is being rebuilt:

      select confluence1_.user_key as user_key0_, confluence1_.user_key as user_key1_, confluence1_.username as username0_, confluence1_.lower_username as lower_us3_0_, confluence1_.username as username1_, confluence1_.lower_username as lower_us3_1_, contentent0_.CONTENTID as x0_0_, contentent0_.VERSION as x1_0_, confluence1_.user_key as x2_0_, contentent0_.LASTMODDATE as x3_0_, contentent0_.VERSIONCOMMENT as x4_0_ from CONTENT contentent0_ left outer join user_mapping confluence1_ on contentent0_.LASTMODIFIER=confluence1_.user_key where (contentent0_.PREVVER=? )or(contentent0_.CONTENTID=? )and(contentent0_.CONTENT_STATUS='current' ) order by  contentent0_.VERSION desc
      

      Particularly, it is due to this section of the query, which was not present during index rebuild for the 5.7 or prior Confluence versions:

      contentent0_.CONTENT_STATUS='current' 
      

      We know that the table index is at play, because if we give a NO_INDEX hint to Oracle while running the above query, which explicitly asks the DB not to use the index on c_status_idx, the results return very swiftly, e.g.:

      select /*+ NO_INDEX(contentent0_ c_status_idx) */ confluence1_.user_key as user_key0_, confluence1_.user_key as user_key1_, confluence1_.username as username0_, confluence1_.lower_username as lower_us3_0_, confluence1_.username as username1_, confluence1_.lower_username as lower_us3_1_, contentent0_.CONTENTID as x0_0_, contentent0_.VERSION as x1_0_, confluence1_.user_key as x2_0_, contentent0_.LASTMODDATE as x3_0_, contentent0_.VERSIONCOMMENT as x4_0_ from CONTENT contentent0_ left outer join user_mapping confluence1_ on contentent0_.LASTMODIFIER=confluence1_.user_key where (contentent0_.PREVVER=949 )or(contentent0_.CONTENTID=949 )and(contentent0_.CONTENT_STATUS='current' ) order by  contentent0_.VERSION desc
      

      The column "content_status" is responsible for tracking whether a page is current, or deleted and sitting in a Space's trash. There are only two possible values for this column: "current" or "deleted". Due to the low cardinality, and that the vast majority of the values are "current", it may not make sense that we have a b-tree table index on this column (especially if it can negatively impact performance).

      Workaround

      PLEASE READ: It is recommended that you follow best practices for Change Management and will test and validate these settings in a Test/Development and Staging environment prior to rolling any changes into a Production environment. You must test and validate these changes to insure that they will function will within your infrastructure prior to placing these changes in production.

      If you suspect that you are impacted by this issue, you should test and evaluate dropping the index on the content_status column of the CONTENT table in a Test or development instance, then test the search index rebuild speed. You may need to restart your instance before you retest the Content Index Rebuild to verify the performance improvement.

      Attachments

        Issue Links

          Activity

            People

              zzhou Zhenhuan Zhou (Inactive)
              rchang Robert Chang
              Votes:
              10 Vote for this issue
              Watchers:
              38 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: