Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Low
Fix Version/s: 3.4-m2
Affects Version/s: 2.7, 2.7.1
Component/s: Search - Core
Labels:

Bug Fix Policy:
View Atlassian Server bug fix policy

Description

Confluence's Lucene cannot search for Chinese characters (both traditional and simplified) in PDF file.
The same characters can be indexed fine in Word DOC file.

It appears that Confluence PDF Extractor fails to extract the chinese characters (See picture). Alphabets can be searched without any problem.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List

characters_encoding_test.PNG
46 kB
05/Mar/2008 5:14 AM
chinesechars_pdf_fails.PNG
46 kB
05/Mar/2008 4:09 AM
search_chinese.PNG
40 kB
05/Mar/2008 5:14 AM
test.doc
62 kB
05/Mar/2008 4:09 AM
test.pdf
33 kB
05/Mar/2008 4:09 AM

Issue Links

duplicates

CONFSERVER-4747 Not all Chinese PDFs are indexing correctly

Closed

is caused by

CONFSERVER-4747 Not all Chinese PDFs are indexing correctly

Closed

is incorporated by

CONFSERVER-16525 Errors indexing PDF documents

Closed

is related to

CONFSERVER-4747 Not all Chinese PDFs are indexing correctly

Closed

Activity

People

Assignee:: Katrina Walser (Inactive)

Reporter:: Roy Hartono [Atlassian]

Votes:: 4 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 05/Mar/2008 4:09 AM

Updated:: 11/Oct/2018 8:58 AM

Resolved:: 12/Oct/2010 1:01 AM