Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: Low
Fix Version/s: 3.4-m2
Affects Version/s: 2.7, 2.7.1
Component/s: Search - Core
Labels:

Confluence's Lucene cannot search for Chinese characters (both traditional and simplified) in PDF file.
The same characters can be indexed fine in Word DOC file.

It appears that Confluence PDF Extractor fails to extract the chinese characters (See picture). Alphabets can be searched without any problem.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

characters_encoding_test.PNG
46 kB
05/Mar/2008 5:14 AM
chinesechars_pdf_fails.PNG
46 kB
05/Mar/2008 4:09 AM
search_chinese.PNG
40 kB
05/Mar/2008 5:14 AM
test.doc
62 kB
05/Mar/2008 4:09 AM
test.pdf
33 kB
05/Mar/2008 4:09 AM

duplicates

CONFSERVER-4747 Not all Chinese PDFs are indexing correctly

Closed

is caused by

CONFSERVER-4747 Not all Chinese PDFs are indexing correctly

Closed

is incorporated by

CONFSERVER-16525 Errors indexing PDF documents

Closed

is related to

CONFSERVER-4747 Not all Chinese PDFs are indexing correctly

Closed

Assignee:: Katrina Walser (Inactive)
Reporter:: Roy Hartono [Atlassian]
Votes:: 4 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: 05/Mar/2008 4:09 AM
Updated:: 11/Oct/2018 8:58 AM
Resolved:: 12/Oct/2010 1:01 AM

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates