Issue Details (XML | Word | Printable)

Key: JRA-13866
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Unassigned
Reporter: Alexey Efimov
Votes: 1
Watchers: 4
Operations

Add/Edit UI Mockup to this issue
If you were logged in you would be able to see more operations.
JIRA

Webwork FastByteArrayOutputStream encoding error

Created: 01/Nov/07 05:39 AM   Updated: 05/Aug/08 08:25 PM
Component/s: Web interface
Affects Version/s: 3.7.2
Fix Version/s: 3.13

Time Tracking:
Original Estimate: Not Specified
Remaining Estimate: 0h
Time Spent - 13h
Time Spent: 13h
Time Spent - 13h

File Attachments: 1. Zip Archive test.zip (3.25 MB)
2. Java Archive File webwork-16May06-jiratld.jar (367 kB)
3. Java Archive File webwork-30Apr07-jiratld.jar (363 kB)
4. Zip Archive webwork_patch_src.zip (2 kB)

Environment: standalone, tomcat, ubuntu

Participants: Alexey Efimov, Anton Mazkovoi [Atlassian], Brad Baker [Atlassian - JIRA BugMaster] and StarWind
Since last comment: 1 year, 23 weeks ago
Resolution Date: 22/Jan/08 10:10 PM
To be done by: Single developer
Labels: webwork patch


 Description  « Hide
User Symptoms

When using a language that requires non-ASCII character support (e.g. Cyrillic), on some screens the non-ASCII characters are replaced with question marks. The problem does not occur in all screens. The problem appears in different locations of the Issue Browser for different issues.

If you change one of the characters in the affected string from a two-byte character to a single-byte character, the problem may disappear for that screen.

The problem is for non ASCII symbols traversed into UTF-8 encoding. For example - Russian locale.
The FastByteArrayOutputStream call new String(bytes, 0, length, "UTF-8") for each part of buffered bytes and this is error. UTF-8 symbol can contains more 1 byte, and such conversion make troubles on endes of byte arrays. Please see attached patch for this class.



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Anton Mazkovoi [Atlassian] added a comment - 02/Nov/07 12:24 PM
Hi Alexey,

Thank you for the patch!

As far as I can see, the problem is that we get an array of bytes, and if the last byte falls somewhere in the middle of a multi-byte character, the conversion does not work. Is this so? This happens even if JIRA is run with UTF-8 encoding (in Administration -> General Configuration -> Character Encoding)?

I am sorry, but I am traveling and I do not have webwork source readily available. May I ask if you have changed:

new String(bytes, 0, length, "UTF-8")

to:

new String(bytes, 0, length)

Does this mean that the code is dependent on the default encoding of the operating system? And will only work if JIRA is also run with the same encoding (in Administration -> General Configuration -> Character Encoding) as the operating system?

Please accept my apologies for not being able to compare your fix with the current webwork source. I am just really interested in this bug and would really appreciate if you could answer the above questions.

Cheers,
Anton


Alexey Efimov added a comment - 02/Nov/07 12:37 PM
Anton,

No. The problem is not for a character at middle of array. This is actualy problem of spliting big byte array to bufferes (you will see LinkedList buffers in this class). So, the big buffer can be converted via new String(bigbuffer, encoding) with no problems, but if you will split it by several smal buffers (per 8192 bytes) you can get small buffer, and you cannot now simple call new String(smallbuffer, encoding) - it is a error.

In other words:

byte[] bigbuffer = bigText.getBytes("UTF-8");
List<byte[]> buffersPer8192Bytes = splitBuffer(bigbuffer);

So, the expression:

out.write(new String(bigbuffer, "UTF-8"));

Is not the same as:

out.write(new String(buffersPer8192Bytes.get(0), "UTF-8"));
out.write(new String(buffersPer8192Bytes.get(1), "UTF-8"));
...
out.write(new String(buffersPer8192Bytes.get(buffersPer8192Bytes.size() - 1), "UTF-8"));

About your question. I'm changed only lines where was code like:

new String(bytes, 0, length, encoding)

Thanks!


Anton Mazkovoi [Atlassian] added a comment - 02/Nov/07 05:11 PM
Hi Alexey,

Thanks for the feedback! I think we mean exactly the same thing.

When you changed:

new String(bytes, 0, length, encoding)

what did you change this to?

Cheers,
Anton


StarWind added a comment - 04/Dec/07 11:18 AM
Please fix it in the next release.

We spent 2 work days to spot this bug.


Alexey Efimov added a comment - 04/Dec/07 12:28 PM
Patch for 3.11 version

Brad Baker [Atlassian - JIRA BugMaster] added a comment - 09/Jan/08 12:25 AM
Thanks for the work you have put into this issue. I can verify that the issue exists.

It will affect people using double byte characters AND where certain web work tags output more than greater than 8K.

Can you please give us some "places" where you see this invalid encoding happening. We want to ensure we have covered all the areas where this can happen.

We think it mainly happens when people input > 8K of data into say a text field but we want to make sure we haven't missed any other areas that may be affected.


Brad Baker [Atlassian - JIRA BugMaster] added a comment - 09/Jan/08 09:57 PM

StarWind added a comment - 10/Jan/08 04:53 AM
Attached is the example of application (test.zip) that affected with this bug.
This is the Struts 2 application, but it also uses the FastByteArrayOutputStream class.
There are 2 JSPs: test.jsp and test.inc that included into test.jsp as "<s:include value="test.inc">".
Struts's include tag uses the FastByteArrayOutputStream for copy content of test.inc into the response stream.
When you call the page like http://localhost:8080/test/Test.do, you may notice gagbage symbols on boundaries of 8KB arrays:
ааааааааааааа��ааааааааааааааа

P.S. Alexey's patch really fix this bug.


StarWind added a comment - 10/Jan/08 05:21 AM

Brad Baker [Atlassian - JIRA BugMaster] added a comment - 22/Jan/08 10:10 PM
WebWork has been updated to include patched code and JIRA now depends on the new code