Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-37804

Inefficient "Find Orphaned Pages" function when using MysQL

    XMLWordPrintable

Details

    Description

      Confluence has a functionality to locate orphaned pages on a space-by-space basis in Space Tools. Orphaned pages are defined as current (non-historical) pages that are not under a parent page, and do not have incoming link references, thus are less likely to be discovered.

      When using MySQL, it has been observed in some instances that this functionality can take a very long time to load, or eventually hit a timeout. This is due to the underlying SQL query itself taking a long time, because MySQL sometimes does not choose the most optimal index to use. Using SQL Debug logging, we can see that the query for locating orphaned pages essentially boils down to:

      SELECT s.SPACENAME, s.SPACEKEY, s.SPACEID, c.TITLE, c.CONTENTID
      FROM   CONTENT c,
             SPACES s
      WHERE  c.CONTENTTYPE = 'PAGE'
             AND ( ( c.PREVVER IS NULL )
                   AND ( c.CONTENT_STATUS = 'current' )
                   AND ( Lower(s.SPACEKEY) = 'SPACE_KEY'
                         AND c.SPACEID = s.SPACEID )
                   AND ( ( s.HOMEPAGE != c.CONTENTID
                           AND c.SPACEID = s.SPACEID )
                          OR ( s.HOMEPAGE IS NULL
                               AND c.SPACEID = s.SPACEID ) )
                   AND ( c.PARENTID IS NULL )
                   AND ( NOT EXISTS(SELECT l.LINKID
                                    FROM   LINKS l,
                                           SPACES s2,
                                           CONTENT c2
                                    WHERE  (
                                   l.DESTSPACEKEY = s2.SPACEKEY
                                   AND
                                   c.SPACEID = s2.SPACEID )
                                           AND
                                   ( Lower(l.DESTPAGETITLE) =
                                     Lower(c.TITLE) )
                                           AND ( Lower(c2.TITLE) !=
                                                 Lower(c.TITLE)
                                                 AND l.CONTENTID =
                                                     c2.CONTENTID )) ) );
      

      Using EXPLAIN, it has been observed in some cases that MySQL will choose to perform an index_merge on the intersection of two indexes on the CONTENT table, c_prevver_idx and c_parentid_idx. In a large environment, this index_merge can result in hundreds of thousands of rows due to the nature of the CONTENT table, which stores metadata for all content in Confluence. This is the case even when looking for orphaned pages in a very small space.

      A potential workaround is to force MySQL to use the c_spaceid_idx index by applying an Index Hint. This isolates the amount of rows needed to be scanned to the specific space, which can greatly reduce lookup time. In one case, we the time to execute went from 6+ minutes to less than half a second. The syntax is as below (note the "USE INDEX (c_spaceid_idx)" portion):

      SELECT s.SPACENAME, s.SPACEKEY, s.SPACEID, c.TITLE, c.CONTENTID
      FROM   CONTENT c USE INDEX (c_spaceid_idx),
             SPACES s
      WHERE  c.CONTENTTYPE = 'PAGE'
             AND ( ( c.PREVVER IS NULL )
                   AND ( c.CONTENT_STATUS = 'current' )
                   AND ( Lower(s.SPACEKEY) = 'SPACE_KEY'
                         AND c.SPACEID = s.SPACEID )
                   AND ( ( s.HOMEPAGE != c.CONTENTID
                           AND c.SPACEID = s.SPACEID )
                          OR ( s.HOMEPAGE IS NULL
                               AND c.SPACEID = s.SPACEID ) )
                   AND ( c.PARENTID IS NULL )
                   AND ( NOT EXISTS(SELECT l.LINKID
                                    FROM   LINKS l,
                                           SPACES s2,
                                           CONTENT c2
                                    WHERE  (
                                   l.DESTSPACEKEY = s2.SPACEKEY
                                   AND
                                   c.SPACEID = s2.SPACEID )
                                           AND
                                   ( Lower(l.DESTPAGETITLE) =
                                     Lower(c.TITLE) )
                                           AND ( Lower(c2.TITLE) !=
                                                 Lower(c.TITLE)
                                                 AND l.CONTENTID =
                                                     c2.CONTENTID )) ) );
      

      This bug report is to see if we can build this into the product such that List Orphaned Pages would always use the most efficient index in MySQL.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rchang Robert Chang
            Votes:
            4 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: