Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-12072

Search not working for PDF files containing Japanese text

XMLWordPrintable

      Search does not work for PDF files containing Japanese text. Please try with the test file attached. Search was tested after rebuilding the Confluence index.

      Seems like a problem with the extractor being used in Confluence 2.6.0, 2.7.1.

      Update: If you rename the Japanese PDF to roman characters, you can find the file name itself in search results, but the file contents look corrupted as in the screenshot. This is expected since the extractor is unable to parse the Japanese PDF and so the contents of the Lucene index also contain corrupted characters.

        1. グリーンシップ募集要項.pdf
          26 kB
          Neeraj Jhanji
        2. PDF_Search.png
          83 kB
          Neeraj Jhanji

            shaffenden Steve Haffenden (Inactive)
            jhanji@imahima.com Neeraj Jhanji
            Votes:
            6 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: