[CLOV-1835] Unicode 0x200B (zero-width whitespace) causes instrumentation failure

Michael Andreacchio made changes - 13/Aug/2019 6:52 AM

Resolution		New: Won't Fix [ 2 ]
Status	Original: Open [ 1 ]	New: Closed [ 6 ]

jonah (Inactive) made changes - 26/Aug/2016 4:11 AM

Symptom Severity

New: Minor [ 14432 ]

Owen made changes - 07/Jul/2016 5:15 AM

Workflow

Original: New Clover Workflow [ 983440 ]

New: New Clover Workflow - Restricted [ 1474581 ]

Grzegorz Lewandowski made changes - 10/Nov/2015 1:52 PM

Remote Link

New: This issue links to "Clover › All JDK Tests › ~~CLOV-1835~~-zero-width-space-instrumentation-escape (devtools-bamboo)" [ 138842 ]

Grzegorz Lewandowski made changes - 10/Nov/2015 1:52 PM

Remote Link

New: This issue links to "Clover › All Ant Groovy Tests › ~~CLOV-1835~~-zero-width-space-instrumentation-escape (devtools-bamboo)" [ 138982 ]

Grzegorz Lewandowski made changes - 10/Nov/2015 1:52 PM

Remote Link

New: This issue links to "Clover › Default › ~~CLOV-1835~~-zero-width-space-instrumentation-escape (devtools-bamboo)" [ 138841 ]

Marek Parfianowicz added a comment - 05/Nov/2015 9:01 AM

How javac (JDK7) treats control characters:

public class A {
    // unicode characters are inserted after first word in each method:

    //void no BreakSpace00A0() {} - not allowed, compilation error
    void word⁠Joiner2060() {} // allowed
    void zeroWidthNoBreakSpaceFEFF() {} // allowed
    void zeroWidthSpace200B() {} // allowed
    void zero‌WidthNonJoiner200C() {} // allowed
    void zero‍WidthJointer200D() {} // allowed
    void softHyphen00AD() {} // allowed
    // void hyphenation‧Point2027() {} - not allowed, compilation error

    // line 2028 and pararagraph 2029 separators
    // int a=1; not allowed
    // int b=2; not allowed
}

Marek Parfianowicz added a comment - 05/Nov/2015 9:01 AM How javac (JDK7) treats control characters: public class A { // unicode characters are inserted after first word in each method: //void no BreakSpace00A0() {} - not allowed, compilation error void word⁠Joiner2060() {} // allowed void zeroWidthNoBreakSpaceFEFF() {} // allowed void zeroWidthSpace200B() {} // allowed void zero‌WidthNonJoiner200C() {} // allowed void zero‍WidthJointer200D() {} // allowed void softHyphen00AD() {} // allowed // void hyphenation‧Point2027() {} - not allowed, compilation error // line 2028 and pararagraph 2029 separators // int  a=1; not allowed // int  b=2; not allowed }

Marek Parfianowicz made changes - 05/Nov/2015 8:21 AM

Description

Original: JDK 6 implements Unicode 4.0. In this version of Unicode, the zero-width whitespace character (0x200B) is being treated as a whitespace.

JDK 7 implements Unicode 6.0. In this version of Unicode, the zero-width whitespace has been reclassified to the 'format character' group (other characters in this group are, for example, left-/right- text direction markers).

Thus, Java compiler in JDK 6 allows to use 0x200B as a normal whitespace character, e.g. separating symbols.

The Java compiler since JDK 7 silently ignores the 0x200B, which means that it cannot be used to separate symbols anymore. However you can put this character virtually in any place, e.g.:

{code:java}
void this<200B>Is<200B>MyMethod();
{code}

Clover fails on parsing the 200B character:

{noformat}
Xyz.java:287:90:unexpected char: 0x200B
at com.atlassian.clover.instr.java.Instrumenter.instrument(Instrumenter.java:166)
at com.atlassian.clover.CloverInstr.execute(CloverInstr.java:76)
at com.atlassian.clover.CloverInstr.mainImpl(CloverInstr.java:54)
at ...
{noformat}

*Planned fix:*

* Ignore 200B characters in Java 7+. Treat 200B character as space in Java 6.
* Question: Shall it be based on source level setting or the JDK detected?
* Question: which other control characters needs to be ignored by Clover?
* Question: which other whitespace characters (other than space, \t, \n, \r) shall be recognized by Clover parser?

*Workaround:*

Remove all 200B character from the source code.

New: JDK 6 implements Unicode 4.0. In this version of Unicode, the zero-width whitespace character (0x200B) is being treated as a whitespace.

JDK 7 implements Unicode 6.0. In this version of Unicode, the zero-width whitespace has been reclassified to the 'format character' group (other characters in this group are, for example, left-/right- text direction markers).

Thus, Java compiler in JDK 6 allows to use 0x200B as a normal whitespace character, e.g. separating symbols.

The Java compiler since JDK 7 silently ignores the 0x200B, which means that it cannot be used to separate symbols anymore. However you can put this character virtually in any place, e.g.:

{code:java}
void this<200B>Is<200B>MyMethod();
{code}

Clover fails on parsing the 200B character:

{noformat}
Xyz.java:287:90:unexpected char: 0x200B
at com.atlassian.clover.instr.java.Instrumenter.instrument(Instrumenter.java:166)
at com.atlassian.clover.CloverInstr.execute(CloverInstr.java:76)
at com.atlassian.clover.CloverInstr.mainImpl(CloverInstr.java:54)
at ...
{noformat}

*Planned fix:*

* Ignore 200B characters in Java 7+. Treat 200B character as space in Java 6.
* Question: Shall it be based on source level setting or the JDK detected?
* Question: which other control characters needs to be ignored by Clover?
* Question: which other whitespace characters (other than space, \t, \n, \r) shall be recognized by Clover parser?

*Workaround:*

Remove all 200B character occurrences from the source code.

Marek Parfianowicz made changes - 05/Nov/2015 8:20 AM

Description

Original: JDK 6 implements Unicode 4.0. In this version of Unicode, the zero-width whitespace character (0x200B) is being treated as a whitespace.

JDK 7 implements Unicode 6.0. In this version of Unicode, the zero-width whitespace has been reclassified to the 'format character' group (other characters in this group are, for example, left-/right- text direction markers).

Thus, Java compiler in JDK 6 allows to use 0x200B as a normal whitespace character, e.g. separating symbols.

The Java compiler since JDK 7 silently ignores the 0x200B, which means that it cannot be used to separate symbols anymore. However you can put this character virtually in any place, e.g.:

{code:java}
void this<200B>Is<200B>MyMethod();
{code}

Clover fails on parsing the 200B character:

{noformat}
Xyz.java:287:90:unexpected char: 0x200B
at com.atlassian.clover.instr.java.Instrumenter.instrument(Instrumenter.java:166)
at com.atlassian.clover.CloverInstr.execute(CloverInstr.java:76)
at com.atlassian.clover.CloverInstr.mainImpl(CloverInstr.java:54)
at ...
{noformat}

*Possible fix:*

Ignore 200B characters in Java 7+. Treat 200B character as space in Java 6.

Q: Shall it be based on source level setting or the JDK detected?

*Workaround:*

Remove all 200B character from the source code.

New: JDK 6 implements Unicode 4.0. In this version of Unicode, the zero-width whitespace character (0x200B) is being treated as a whitespace.

JDK 7 implements Unicode 6.0. In this version of Unicode, the zero-width whitespace has been reclassified to the 'format character' group (other characters in this group are, for example, left-/right- text direction markers).

Thus, Java compiler in JDK 6 allows to use 0x200B as a normal whitespace character, e.g. separating symbols.

The Java compiler since JDK 7 silently ignores the 0x200B, which means that it cannot be used to separate symbols anymore. However you can put this character virtually in any place, e.g.:

{code:java}
void this<200B>Is<200B>MyMethod();
{code}

Clover fails on parsing the 200B character:

{noformat}
Xyz.java:287:90:unexpected char: 0x200B
at com.atlassian.clover.instr.java.Instrumenter.instrument(Instrumenter.java:166)
at com.atlassian.clover.CloverInstr.execute(CloverInstr.java:76)
at com.atlassian.clover.CloverInstr.mainImpl(CloverInstr.java:54)
at ...
{noformat}

*Planned fix:*

* Ignore 200B characters in Java 7+. Treat 200B character as space in Java 6.
* Question: Shall it be based on source level setting or the JDK detected?
* Question: which other control characters needs to be ignored by Clover?
* Question: which other whitespace characters (other than space, \t, \n, \r) shall be recognized by Clover parser?

*Workaround:*

Remove all 200B character from the source code.

Marek Parfianowicz made changes - 05/Nov/2015 8:17 AM

Rank

New: Ranked higher

Details

Description

Attachments

Forms

Activity

Collapse comment: Marek Parfianowicz added a comment - 05/Nov/2015 9:01 AM

Expand comment: Marek Parfianowicz added a comment - 05/Nov/2015 9:01 AM

People

Dates