PDF Extractor unable to index chinese and Japanese characters

XMLWordPrintable

      Confluence's Lucene cannot search for Chinese characters (both traditional and simplified) in PDF file.
      The same characters can be indexed fine in Word DOC file.

      It appears that Confluence PDF Extractor fails to extract the chinese characters (See picture). Alphabets can be searched without any problem.

        1. characters_encoding_test.PNG
          46 kB
          Roy Hartono [Atlassian]
        2. chinesechars_pdf_fails.PNG
          46 kB
          Roy Hartono [Atlassian]
        3. search_chinese.PNG
          40 kB
          Roy Hartono [Atlassian]
        4. test.doc
          62 kB
          Roy Hartono [Atlassian]
        5. test.pdf
          33 kB
          Roy Hartono [Atlassian]

            Assignee:
            Katrina Walser (Inactive)
            Reporter:
            Roy Hartono [Atlassian]
            Votes:
            4 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: