[CONFSERVER-9258] Incorrect search results for single and double-byte Japanese strings

Type: Suggestion
Resolution: Fixed
Fix Version/s: 2.6.2
Component/s: None
Labels:
- affects-server
- internationalisation
Environment:
All with internationalization settings set correctly and using CJK indexing

Feedback Policy:

We collect Confluence feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

Hello,

We noticed incorrect behavior in how Confluence searches for single-bye and double-byte Japanese strings. Search for roman alphabets yields results incorporating both single and double-byte matches, as it should be. Similar behavior is required when searching for Japanese characters, for example, search for single byte katakana or numeric characters should return results matching both single and double byte occurrences, but now only double-byte matches are being retrieved. Please see attached Excel sheets summarizing the current behavior and the required behavior for single-byte and double-byte Japanese strings. Could you please investigate this and incorporate this improvement in a future release?

Thanks,

Neeraj

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

Character+requirement+(single).xls
17 kB
15/Oct/2007 7:44 AM
Character requirement (double).xls
17 kB
22/Aug/2007 10:40 AM
Character requirement (single).xls
17 kB
15/Oct/2007 7:44 AM
Character requirement (single).xls
17 kB
22/Aug/2007 10:40 AM
Screenshot-Site Search - Confluence - Mozilla Firefox.png
147 kB
22/Oct/2007 4:17 AM

is incorporated by

CONFSERVER-6379 Search for Japanese strings should not include partial matches

Closed

is related to

CONFSERVER-9833 Search not working for Powerpoint or PDF files containing Japanese text

Closed

CONFSERVER-9834 Secondary search bar does not work correctly for non Latin characters

Closed

relates to

CONFSERVER-9514 Attachments with Japanese filenames not matched in Confluence search results

Closed

CONFSERVER-10530 Search does not show result right when searching with 3 Korean word

Closed

CONFSERVER-6934 Highlighting of matched terms in search results does not work in Japanese

Closed

(1 relates to)

Sean Osawa (Inactive) added a comment - 01/Feb/2010 5:43 PM

systeminfo.png was removed as the customer's request.

Sean Osawa (Inactive) added a comment - 01/Feb/2010 5:43 PM systeminfo.png was removed as the customer's request.

Sean Osawa (Inactive) added a comment - 27/Jan/2010 3:47 PM

Removed a test result file, which had been attached to this issue, as requested by the customer.

Sean Osawa (Inactive) added a comment - 27/Jan/2010 3:47 PM Removed a test result file, which had been attached to this issue, as requested by the customer.

Paul Curren added a comment - 08/Nov/2007 11:57 PM

Agnes, could you please review these changes prior to 2.6.2?

Thanks.

Paul Curren added a comment - 08/Nov/2007 11:57 PM Agnes, could you please review these changes prior to 2.6.2? Thanks.

Neeraj Jhanji added a comment - 29/Oct/2007 6:10 AM

Yes, would be great if you can provide a patch for 2.6.0 that address ~~CONF-9258~~, ~~CONF-9833~~ & ~~CONF-9834~~, which are all major issues from Japanese QA perspective.

Neeraj Jhanji added a comment - 29/Oct/2007 6:10 AM Yes, would be great if you can provide a patch for 2.6.0 that address CONF-9258 , CONF-9833 & CONF-9834 , which are all major issues from Japanese QA perspective.

Andrew Lynch (Inactive) added a comment - 29/Oct/2007 12:55 AM

Hi Neeraj,

The changes were not too numerous, so I should be able to provide a patch for 2.6.0 if essential.

The search tab has not yet been fixed, I have created another issue (~~CONF-9834~~) which should be watched for updates.

Andrew Lynch (Inactive) added a comment - 29/Oct/2007 12:55 AM Hi Neeraj, The changes were not too numerous, so I should be able to provide a patch for 2.6.0 if essential. The search tab has not yet been fixed, I have created another issue ( CONF-9834 ) which should be watched for updates.

Neeraj Jhanji added a comment - 26/Oct/2007 2:43 AM - edited

We are focused on releasing Confluence 2.6 to Japanese customers. This is a major upgrade from the previous JP release 2.4.3 since it fixes a major issue surrounding PDF export. If possible, we'd like to include the search fix in this release as well since the next Japanese release will be further out (possibly next year). Is it possible to get a patch for 2.6?

Also, to confirm, will search for half width katakana and numeric characters work properly from the top search bar as well as from the search tab?

Neeraj Jhanji added a comment - 26/Oct/2007 2:43 AM - edited We are focused on releasing Confluence 2.6 to Japanese customers. This is a major upgrade from the previous JP release 2.4.3 since it fixes a major issue surrounding PDF export. If possible, we'd like to include the search fix in this release as well since the next Japanese release will be further out (possibly next year). Is it possible to get a patch for 2.6? Also, to confirm, will search for half width katakana and numeric characters work properly from the top search bar as well as from the search tab?

Andrew Lynch (Inactive) added a comment - 26/Oct/2007 1:44 AM

Hi Neeraj,

My apologies, I provide the wrong issue number. It should be LUCENE-1032.

The custom Japanese analyzer will be available as an option in the Content Indexing tab from 2.6.2 onwards.

Regards,

Andrew Lynch

Andrew Lynch (Inactive) added a comment - 26/Oct/2007 1:44 AM Hi Neeraj, My apologies, I provide the wrong issue number. It should be LUCENE-1032. The custom Japanese analyzer will be available as an option in the Content Indexing tab from 2.6.2 onwards. Regards, Andrew Lynch

Neeraj Jhanji added a comment - 26/Oct/2007 1:41 AM

Hi Andrew,

1. Where can I get the Custom Japanese Analyzer and what are the installation instructions for it?

2. Regarding the Lucene support issue you mention above, I did not see any mention of the problems with half-width Japanese characters.

Please clarify.

regards,

Neeraj

Neeraj Jhanji added a comment - 26/Oct/2007 1:41 AM Hi Andrew, 1. Where can I get the Custom Japanese Analyzer and what are the installation instructions for it? 2. Regarding the Lucene support issue you mention above, I did not see any mention of the problems with half-width Japanese characters. Please clarify. regards, Neeraj

Andrew Lynch (Inactive) added a comment - 26/Oct/2007 1:08 AM

Fixed by use of custom Analyzer.

Andrew Lynch (Inactive) added a comment - 26/Oct/2007 1:08 AM Fixed by use of custom Analyzer.

Andrew Lynch (Inactive) added a comment - 26/Oct/2007 1:07 AM - edited

A quick update Neeraj,

Lucene's CJKAnalyzer is definitely not indexing half width characters correctly. I have raised an issue (http://issues.apache.org/jira/browse/LUCENE-1032) to address this.
We were considering creating a patch ourselves, but the simplest implementation would require usage of Java 6's Normalizer class. In order to solve this, I have created our Analyzer, Custom Japanese Analyzer. Unfortunately this only works on Sun JDKs and so it will not be incorporated into Lucene and may not work for all customers.

Customers who are experiencing problems such as the ones you outlined should use this Analyzer in place of CJKAnalyzer until the issue with Lucene is resolved, assuming they have a Sun JDK.

Andrew Lynch (Inactive) added a comment - 26/Oct/2007 1:07 AM - edited A quick update Neeraj, Lucene's CJKAnalyzer is definitely not indexing half width characters correctly. I have raised an issue ( http://issues.apache.org/jira/browse/LUCENE-1032 ) to address this. We were considering creating a patch ourselves, but the simplest implementation would require usage of Java 6's Normalizer class. In order to solve this, I have created our Analyzer, Custom Japanese Analyzer. Unfortunately this only works on Sun JDKs and so it will not be incorporated into Lucene and may not work for all customers. Customers who are experiencing problems such as the ones you outlined should use this Analyzer in place of CJKAnalyzer until the issue with Lucene is resolved, assuming they have a Sun JDK.

Assignee:: Agnes Ro

Reporter:: Neeraj Jhanji

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 22/Aug/2007 10:40 AM

Updated:: 19/Sep/2019 5:23 AM

Resolved:: 12/Nov/2007 3:24 AM

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

Collapse comment: Sean Osawa (Inactive) added a comment - 01/Feb/2010 5:43 PM

Expand comment: Sean Osawa (Inactive) added a comment - 01/Feb/2010 5:43 PM

Collapse comment: Sean Osawa (Inactive) added a comment - 27/Jan/2010 3:47 PM

Expand comment: Sean Osawa (Inactive) added a comment - 27/Jan/2010 3:47 PM

Collapse comment: Paul Curren added a comment - 08/Nov/2007 11:57 PM

Expand comment: Paul Curren added a comment - 08/Nov/2007 11:57 PM

Collapse comment: Neeraj Jhanji added a comment - 29/Oct/2007 6:10 AM

Expand comment: Neeraj Jhanji added a comment - 29/Oct/2007 6:10 AM

Collapse comment: Andrew Lynch (Inactive) added a comment - 29/Oct/2007 12:55 AM

Expand comment: Andrew Lynch (Inactive) added a comment - 29/Oct/2007 12:55 AM

Collapse comment: Neeraj Jhanji added a comment - 26/Oct/2007 2:43 AM, Edited by Neeraj Jhanji - 27/Oct/2007 4:43 AM

Expand comment: Neeraj Jhanji added a comment - 26/Oct/2007 2:43 AM, Edited by Neeraj Jhanji - 27/Oct/2007 4:43 AM

Collapse comment: Andrew Lynch (Inactive) added a comment - 26/Oct/2007 1:44 AM

Expand comment: Andrew Lynch (Inactive) added a comment - 26/Oct/2007 1:44 AM

Collapse comment: Neeraj Jhanji added a comment - 26/Oct/2007 1:41 AM

Expand comment: Neeraj Jhanji added a comment - 26/Oct/2007 1:41 AM

Collapse comment: Andrew Lynch (Inactive) added a comment - 26/Oct/2007 1:08 AM

Expand comment: Andrew Lynch (Inactive) added a comment - 26/Oct/2007 1:08 AM

Collapse comment: Andrew Lynch (Inactive) added a comment - 26/Oct/2007 1:07 AM, Edited by Andrew Lynch - 26/Oct/2007 1:45 AM

Expand comment: Andrew Lynch (Inactive) added a comment - 26/Oct/2007 1:07 AM, Edited by Andrew Lynch - 26/Oct/2007 1:45 AM

People

Dates