Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-40392

PDF Index: Can't determine the width of the space error is logged on Windows

    XMLWordPrintable

Details

    Description

      Environment

      • Only happens to Confluence 5.9.x
      • Only happens in Windows environment. OSX and Linux not affected.

      Diagonosis:

      The following errors appears in the logs.

      015-10-28 19:14:07,724 ERROR [Indexer: 1] [pdfbox.pdmodel.font.PDSimpleFont] getSpaceWidth Can't determine the width of the space character using 250 as default
       -- referer: http://localhost:8090/admin/search-indexes.action | url: /admin/reindex.action | userName: admin | action: reindex
      java.lang.IllegalArgumentException: name
      	at sun.misc.URLClassPath$Loader.findResource(Unknown Source)
      	at sun.misc.URLClassPath.findResource(Unknown Source)
      	at java.net.URLClassLoader$2.run(Unknown Source)
      	at java.net.URLClassLoader$2.run(Unknown Source)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at java.net.URLClassLoader.findResource(Unknown Source)
      	at org.apache.catalina.loader.WebappClassLoaderBase.findResource(WebappClassLoaderBase.java:946)
      	at org.apache.catalina.loader.WebappClassLoaderBase.getResourceAsStream(WebappClassLoaderBase.java:1117)
      	at org.apache.fontbox.util.ResourceLoader.loadResource(ResourceLoader.java:62)
      	at org.apache.fontbox.util.FontManager.findTTFont(FontManager.java:331)
      	at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getTTFFont(PDTrueTypeFont.java:638)
      

      Cause

      The PDF documents might contain some specific special characters or some exotic fonts.
      Reference: https://issues.apache.org/jira/browse/PDFBOX-1706
      (Please note that the attachment in this link doesn't generate the error)

      Reason

      The error occurs when apache pdf box is trying to index pdf attachments.

      On text extraction pdfbox tries to render text and lookup right fonts. Strictly speaking, it's not needed at all because the purpose of indexing is to find raw text and Confluence is not interested in any visual representation of text.

      However, the indexing still works and PDF attachment contents are completely searchable.

      Workaround to suppress the error

      To remove the error in the logs, go to <Confluence Installation>\confluence\WEB-INF\classes\log4j.properties and add the log level to FATAL for this particular class.

      log4j.logger.org.apache.pdfbox.pdmodel.font.PDSimpleFont=FATAL
      

      Please note that after doing this, all other error thrown from this class will not be shown in the logs in the future.
      Alternatively, you can choose to just ignore the error in the logs.

      Attachments

        Issue Links

          Activity

            People

              fxu Feng Xu (Inactive)
              vteoh Victor Teoh (Inactive)
              Votes:
              9 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: