Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: Low
Fix Version/s: 3.4-m2
Affects Version/s: 2.7, 2.7.1
Component/s: Search - Core
Labels:

Confluence's Lucene cannot search for Chinese characters (both traditional and simplified) in PDF file.
The same characters can be indexed fine in Word DOC file.

It appears that Confluence PDF Extractor fails to extract the chinese characters (See picture). Alphabets can be searched without any problem.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

characters_encoding_test.PNG
05/Mar/2008 5:14 AM
46 kB
Roy Hartono [Atlassian]
chinesechars_pdf_fails.PNG
05/Mar/2008 4:09 AM
46 kB
Roy Hartono [Atlassian]
search_chinese.PNG
05/Mar/2008 5:14 AM
40 kB
Roy Hartono [Atlassian]
test.doc
05/Mar/2008 4:09 AM
62 kB
Roy Hartono [Atlassian]
test.pdf
05/Mar/2008 4:09 AM
33 kB
Roy Hartono [Atlassian]

duplicates

CONFSERVER-4747 Not all Chinese PDFs are indexing correctly

Closed

is caused by

CONFSERVER-4747 Not all Chinese PDFs are indexing correctly

Closed

is incorporated by

CONFSERVER-16525 Errors indexing PDF documents

Closed

is related to

CONFSERVER-4747 Not all Chinese PDFs are indexing correctly

Closed

Assignee:: Katrina Walser (Inactive)
Reporter:: Roy Hartono [Atlassian]
Votes:: 4 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: 05/Mar/2008 4:09 AM
Updated:: 11/Oct/2018 8:58 AM
Resolved:: 12/Oct/2010 1:01 AM

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates