• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Low Low
    • None
    • 7.13.8, 8.0.3
    • None

      Issue Summary

      With Compress HTTP Responses enabled, Confluence will truncate the end of an uploaded robots.txt file, affecting its ability to set proper deny/allow rules for search engine crawlers. This is important as Confluence can be severely impacted by search engine services such as Google Search Appliance, as noted in CONFSERVER-8749.

      This appears to only affect smaller robots.txt files, as we don't see the issue occur when using the "aggressive" robots.txt mentioned in CONFSERVER-8749. However, it's fairly common to have a smaller robots.txt file as only two lines are needed to deny all web crawler requests:

      User-agent: *
      Disallow: /
      

      What we're seeing, in this case, is that the 2nd "disallow" line is getting cut to just a "D", thus not actually setting the deny rule properly.

      We don't see the issue occur when Compress HTTP Responses is disabled, however, this option is enabled by default and useful as it can help Confluence over slow or congested Internet links, and reduce the amount of bandwidth consumed by a Confluence server.

      Steps to Reproduce

      Reproducible on Data Center?: Yes

      1. Upload a robots.txt file to <Confluence-Install-Directory>/confluence with the following text:
        • User-agent: *
          Disallow: /
          
      1. Ensure that the file has the correct file permissions and that the dedicated user account that runs the Confluence has access to the file
      2. Restart Confluence

      Expected Results

      When visiting http://<CONFLUENCE-BASE-URL>/robots.txt, you should see all the correct text in the file.

      Actual Results

      When visiting http://<CONFLUENCE-BASE-URL>/robots.txt, you instead get the following truncated content:

      User-agent: *
      D
      

      Workaround

      The current workaround is to add a large comment at the end of the file. Adding a commented-out paragraph of Lorem Ipsum text should be enough.

      Unfortunately, our documentation on excluding items from HTTP compression is fairly old and doesn't work as described any longer. We'll update this workaround should that get updated.

            [CONFSERVER-81888] GZip Transfer Encoding truncating smaller robots.txt files

            There are no comments yet on this issue.

              Unassigned Unassigned
              mgomez@atlassian.com Manny
              Affected customers:
              0 This affects my team
              Watchers:
              2 Start watching this issue

                Created:
                Updated: