Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-58644

Exporting a page to PDF via sandbox produces documents with broken encoding

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Low Low
    • 7.17.5, 7.18.2
    • 6.12.0, 6.13.0, 6.14.0, 6.15.1, 7.4.4, 7.13.1
    • Data Center - Core
    • None

      Issue Summary

      Exporting a page to PDF on Data Centre (i.e. via sandbox process) will produce a document with unreadable text because of broken encoding.

      Environment

      Data Center

      Steps to Reproduce

      1. Ensure that appropriate fonts are installed before exporting to PDF
      2. Create a page with non-English text
      3. From the page menu select "Export to PDF"

      Expected Results

      A PDF file is produced and all non-English text is readable.

      Actual Results

      PDF is produced, but all non-English text is broken. UTF-8 is misinterpreted as Windows-1251 (if on Windows platform) or with other incorrect encodings.

      Example of a broken encoding:

      ЗаÐ3Ð3⁄4лÐ3⁄4Ð2Ð3⁄4Ðo
      ТÐμÐoÑ�Ñ‚ пÐ3⁄4-руÑ�Ñ�ÐoÐ ̧.
      English text.
      ç3⁄4Žå›1⁄2马里å...°å·ž

      The name of the file contains underscores.

      Workaround #1

      Use conversion.sandbox.java.options system property to propagate encoding setting to the sandbox process.

      First, we will have to discover Java propertied for the sandbox process:

      1. Go to "Confluence administration" -> "Logging and Profiling" and enable this package in the logging settings: com.atlassian.confluence.impl.util.sandbox , use INFO level. Press the "Save" button at the bottom of the page
      2. Ensure that no sandbox processes are running. If they do, then kill them
      3. Export a page to PDF
      4. Search for the message in the Confluence logs:
        [impl.util.sandbox.SandboxProcess] start Sandbox 0: Starting sandbox process: ...
      5. Copy and save all the command line parameters that start with -D
        You may remove the com.atlassian.confluence.impl.util.sandbox logging now.

      Now we overwrite sandbox parameters with what we just found plus encoding:

      1. Specify conversion.sandbox.java.options system property for Confluence:
        -Dconversion.sandbox.java.options=<all Java -D options copied from the log, separated by comma>,-Dsun.jnu.encoding=UTF-8,-Dfile.encoding=UTF-8
        -Dsun.jnu.encoding=UTF-8
        -Dfile.encoding=UTF-8
        

        Example (your actual parameters may be different):

        -Dconversion.sandbox.java.options=-Dhttp.proxyHost=proxy.example.com,-Dhttp.proxyPort=8080,-Dsun.jnu.encoding=UTF-8,-Dfile.encoding=UTF-8
        -Dsun.jnu.encoding=UTF-8
        -Dfile.encoding=UTF-8
        
      2. Restart Confluence

      Caveat: If you change Confluence networking properties (proxy host, proxy port, etc), then you will have to change the conversion.sandbox.java.options, as well. The system will not pass Confluence networking properties to the sandbox process if conversion.sandbox.java.options is specified.

      Workaround #2

      Warning: this workaround will reduce the stability of the node and disable isolation for PDF exports. This is not recommended for a production environment.
      Disable sandbox process for PDF export altogether:

      1. Specify Confluence system property: -Dpdf.export.sandbox.disable=true
      2. Restart Confluence

              ablekhman@atlassian.com Alex Blekhman (Inactive)
              ablekhman@atlassian.com Alex Blekhman (Inactive)
              Votes:
              6 Vote for this issue
              Watchers:
              13 Start watching this issue

                Created:
                Updated:
                Resolved: