In an environment where there is thousands of pages, the PageNotFound action is extremely CPU intensive when the alternative page search hits many pages.

      One such example to be considered is the space home page in an environment with thousands of spaces. If the home page of a space (EXAMPLE) is changed from "Home" to some other page and the URL mywiki/display/EXAMPLE/Home is visited, the alterative page search will hit every other "Home" page in all of the other spaces. From stack traces it appears that this operation will spend much time in the permission checking code path, triggering constant Hibernate session flushes to the DB.

      Such operations may last tens of minutes in production environments and are often exacerbated when frustrated users constantly click refresh in an attempt to elicit a response from the application.

      The options are to limit the number of alternatives we propose (useful in a general sense anyway, 2000 alternatives is not really helpful) and/or reduce the cost of permission checks in this case.

      Patch available

      To apply the patch (tested in Confluence 2.6 and later):

      1. Shut down Confluence
      2. Find the file alternativepages.vm in the pages/ directory of your Confluence application. Replace it with the attached version.
      3. Find the file xwork.xml in the WEB-INF/classes/ directory of your Confluence application. Find the following lines in the file:

      <action name="pagenotfound" class="com.atlassian.confluence.pages.actions.PageNotFoundAction">
          <result name="error" type="dispatcher">/fourohfour.action</result>
          <result name="success" type="velocity">/pages/alternativepages.vm</result>
      </action>
      

      Change the class in the first line to "com.atlassian.confluence.core.ConfluenceActionSupport" as shown below:

      <action name="pagenotfound" class="com.atlassian.confluence.core.ConfluenceActionSupport">
          <result name="error" type="dispatcher">/fourohfour.action</result>
          <result name="success" type="velocity">/pages/alternativepages.vm</result>
      </action>
      

      4. Start Confluence again. The "Page not found" page should now load quickly, irrespective of how many spaces are in the instance.

        1. alternativepages.vm
          1 kB
          Jeremy Largman
        2. lockup query.sql
          2 kB
          Chloe Sowers

            [CONFSERVER-12864] PageNotFound action can render Confluence inoperable

            Change the space key to match a large space in your DB. The results are cached, so 2nd run may be quick. Change page title to another random string to witness lockup again. Query takes 15m-16m on MSSQL when the problem manifests.

            Chloe Sowers added a comment - Change the space key to match a large space in your DB. The results are cached, so 2nd run may be quick. Change page title to another random string to witness lockup again. Query takes 15m-16m on MSSQL when the problem manifests.

            This was not fixed in version 3.1. We have 3.4.8, and it still uses PageNotFound.action. You have to extract the xwork.xml from the confluence-3.x.x.jar file, then edit it.

            We had this problem and I believe it was due to corrupted, or inefficient indices in the database (aside from the monstrous query). We were using MSSQL, which itself is not a paragon of speed. We have a very large DB with many spaces and pages. We pinpointed the query, attached, and it would take 15m to run in MSSQL Studio!

            Rebuilding all indices may solve the problem, but I cannot tell for sure since we also switched the primary and mirror at the same time.

            You can also add a DB timeout, which will display an error screen, but is better than crashing.
            https://confluence.atlassian.com/display/CONFKB/Configuring+a+Database+Query+Timeout

            Chloe Sowers added a comment - This was not fixed in version 3.1. We have 3.4.8, and it still uses PageNotFound.action. You have to extract the xwork.xml from the confluence-3.x.x.jar file, then edit it. We had this problem and I believe it was due to corrupted, or inefficient indices in the database (aside from the monstrous query). We were using MSSQL, which itself is not a paragon of speed. We have a very large DB with many spaces and pages. We pinpointed the query, attached, and it would take 15m to run in MSSQL Studio! Rebuilding all indices may solve the problem, but I cannot tell for sure since we also switched the primary and mirror at the same time. You can also add a DB timeout, which will display an error screen, but is better than crashing. https://confluence.atlassian.com/display/CONFKB/Configuring+a+Database+Query+Timeout

            Changing title to better reflect the severity of the problem.

            Jeremy Largman added a comment - Changing title to better reflect the severity of the problem.

            Anatoli added a comment -

            Looked through the changes that were made to address review comments.

            Anatoli added a comment - Looked through the changes that were made to address review comments.

            The details for locating/editing the xwork.xml file are here.

            Gurleen Anand [Atlassian] added a comment - The details for locating/editing the xwork.xml file are here .

            Could the patch instructions be updated for 3.0? I may yet find it, but nothing like the referenced xwork.xml appears in my 3.0.1 install.

            This is now a major problem for us; a couple of attempts of this type on our production wiki, and the server is redline and unresponsive for several hours.

            andrew m. boardman added a comment - Could the patch instructions be updated for 3.0? I may yet find it, but nothing like the referenced xwork.xml appears in my 3.0.1 install. This is now a major problem for us; a couple of attempts of this type on our production wiki, and the server is redline and unresponsive for several hours .

            This issue won't make it into 3.0. It will be given strong consideration for the release beyond that.

            Paul Curren added a comment - This issue won't make it into 3.0. It will be given strong consideration for the release beyond that.

            According to Igor's comment this bug is made even worse by large trash content (which you can't get rid of easily due to bug CONF-15233).

            Per Fragemann [Atlassian] added a comment - According to Igor's comment this bug is made even worse by large trash content (which you can't get rid of easily due to bug CONF-15233 ).

            The database queries are still issued if you just change the Velocity file as documented above. You need to also make a one-line change to WEB-INF/classes/xwork.xml to prevent the search for alternatives being done on the backend. I've documented this in the description above.

            Matt Ryall added a comment - The database queries are still issued if you just change the Velocity file as documented above. You need to also make a one-line change to WEB-INF/classes/xwork.xml to prevent the search for alternatives being done on the backend. I've documented this in the description above.

            I guess this one has not been scheduled yet. Might want to do it together with the BF team?

            Per Fragemann [Atlassian] added a comment - I guess this one has not been scheduled yet. Might want to do it together with the BF team?

              dtaylor David Taylor (Inactive)
              christopher.owen@atlassian.com Christopher Owen [Atlassian]
              Affected customers:
              8 This affects my team
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: