Details
-
Bug
-
Resolution: Unresolved
-
Low
-
None
-
5.7.4
-
2
-
Severity 3 - Minor
-
0
-
Description
Confluence has a functionality to locate orphaned pages on a space-by-space basis in Space Tools. Orphaned pages are defined as current (non-historical) pages that are not under a parent page, and do not have incoming link references, thus are less likely to be discovered.
When using MySQL, it has been observed in some instances that this functionality can take a very long time to load, or eventually hit a timeout. This is due to the underlying SQL query itself taking a long time, because MySQL sometimes does not choose the most optimal index to use. Using SQL Debug logging, we can see that the query for locating orphaned pages essentially boils down to:
SELECT s.SPACENAME, s.SPACEKEY, s.SPACEID, c.TITLE, c.CONTENTID FROM CONTENT c, SPACES s WHERE c.CONTENTTYPE = 'PAGE' AND ( ( c.PREVVER IS NULL ) AND ( c.CONTENT_STATUS = 'current' ) AND ( Lower(s.SPACEKEY) = 'SPACE_KEY' AND c.SPACEID = s.SPACEID ) AND ( ( s.HOMEPAGE != c.CONTENTID AND c.SPACEID = s.SPACEID ) OR ( s.HOMEPAGE IS NULL AND c.SPACEID = s.SPACEID ) ) AND ( c.PARENTID IS NULL ) AND ( NOT EXISTS(SELECT l.LINKID FROM LINKS l, SPACES s2, CONTENT c2 WHERE ( l.DESTSPACEKEY = s2.SPACEKEY AND c.SPACEID = s2.SPACEID ) AND ( Lower(l.DESTPAGETITLE) = Lower(c.TITLE) ) AND ( Lower(c2.TITLE) != Lower(c.TITLE) AND l.CONTENTID = c2.CONTENTID )) ) );
Using EXPLAIN, it has been observed in some cases that MySQL will choose to perform an index_merge on the intersection of two indexes on the CONTENT table, c_prevver_idx and c_parentid_idx. In a large environment, this index_merge can result in hundreds of thousands of rows due to the nature of the CONTENT table, which stores metadata for all content in Confluence. This is the case even when looking for orphaned pages in a very small space.
A potential workaround is to force MySQL to use the c_spaceid_idx index by applying an Index Hint. This isolates the amount of rows needed to be scanned to the specific space, which can greatly reduce lookup time. In one case, we the time to execute went from 6+ minutes to less than half a second. The syntax is as below (note the "USE INDEX (c_spaceid_idx)" portion):
SELECT s.SPACENAME, s.SPACEKEY, s.SPACEID, c.TITLE, c.CONTENTID FROM CONTENT c USE INDEX (c_spaceid_idx), SPACES s WHERE c.CONTENTTYPE = 'PAGE' AND ( ( c.PREVVER IS NULL ) AND ( c.CONTENT_STATUS = 'current' ) AND ( Lower(s.SPACEKEY) = 'SPACE_KEY' AND c.SPACEID = s.SPACEID ) AND ( ( s.HOMEPAGE != c.CONTENTID AND c.SPACEID = s.SPACEID ) OR ( s.HOMEPAGE IS NULL AND c.SPACEID = s.SPACEID ) ) AND ( c.PARENTID IS NULL ) AND ( NOT EXISTS(SELECT l.LINKID FROM LINKS l, SPACES s2, CONTENT c2 WHERE ( l.DESTSPACEKEY = s2.SPACEKEY AND c.SPACEID = s2.SPACEID ) AND ( Lower(l.DESTPAGETITLE) = Lower(c.TITLE) ) AND ( Lower(c2.TITLE) != Lower(c.TITLE) AND l.CONTENTID = c2.CONTENTID )) ) );
This bug report is to see if we can build this into the product such that List Orphaned Pages would always use the most efficient index in MySQL.