Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-94830

Running parallel Space PDF exports does not maximize the utilization of the Sandbox processes in the pool

      Issue Summary

      When running N concurrent space PDF exports, the number of sandbox processes working in parallel for serving these requests is lower than N.

      This is reproducible on Data Center: yes

      Steps to Reproduce

      1. Provision a fresh Confluence instance
      2. Increase the Sandbox pool to 22 processes (for example) in your setenv.sh:
        -Dconversion.sandbox.pool.size=22
        
      3. Increase the number of PDF exports that can be execute concurrently in your setenv.sh:
        -Dconfluence.pdfexport.permits.size=20
        
      4. Restart Confluence
      5. Import a Space in your system that when exporting to PDF takes several minutes to complete (ideally more than ~15min)
      6. Run multiple PDF Space exports simultaneously

      Expected Results

      There are so many Sandbox processes executing requests as PDF exports running concurrently.

      Actual Results

      As per our documentation Recognized System Properties:

      conversion.sandbox.pool.size: Use this property to increase the number of processes (sandboxes) in the external process pool. More processes means more tasks can be executed in parallel, but will consume more memory and CPU resources on each node.

      However, monitoring the sandbox workers that are actively executing an export request, proves that the degree of parallelism expected is not fulfilled.

      Our code(SandboxLocalProcessPool.java) uses the following logic to assign the Sandbox processes responsible for executing a task:

          public <T, R> R execute(SandboxRequest<T, R> request) {
              final int index = IntMath.mod(request.getInput().hashCode(), configuration.getConcurrencyLevel());
              final SandboxProcess process = processes[index];
          }

      Our code calculates the arithmetic modulus of the number of Sandbox processes (a.k.a getConcurrencyLevel()) in the pool and the HashCode, without checking if the sandbox is free or no (at least, from what I can observe from this code).

      If two requests end up in the same sandbox process (due to the arithmetic modulus calculation), the workers will have queued work waiting for them. 

      Workaround

      Currently, there is no known workaround for this behavior. A workaround will be added here when available

        1. Confluence-space-export-PDFTES.zip
          3.15 MB
        2. SandboxLocalProcessPool.java
          11 kB
        3. SandboxProcess.java
          13 kB

            [CONFSERVER-94830] Running parallel Space PDF exports does not maximize the utilization of the Sandbox processes in the pool

            There are no comments yet on this issue.

              Unassigned Unassigned
              d8a006ac9dc7 Iker Alonso
              Affected customers:
              1 This affects my team
              Watchers:
              3 Start watching this issue

                Created:
                Updated: