Uploaded image for project: 'Clover'
  1. Clover
  2. CLOV-1835

Unicode 0x200B (zero-width whitespace) causes instrumentation failure

    • Icon: Bug Bug
    • Resolution: Won't Fix
    • Icon: Low Low
    • open-source
    • None
    • Instrumentation
    • None
    • Severity 3 - Minor

      JDK 6 implements Unicode 4.0. In this version of Unicode, the zero-width whitespace character (0x200B) is being treated as a whitespace.

      JDK 7 implements Unicode 6.0. In this version of Unicode, the zero-width whitespace has been reclassified to the 'format character' group (other characters in this group are, for example, left-/right- text direction markers).

      Thus, Java compiler in JDK 6 allows to use 0x200B as a normal whitespace character, e.g. separating symbols.

      The Java compiler since JDK 7 silently ignores the 0x200B, which means that it cannot be used to separate symbols anymore. However you can put this character virtually in any place, e.g.:

      void this<200B>Is<200B>MyMethod();
      

      Clover fails on parsing the 200B character:

      Xyz.java:287:90:unexpected char: 0x200B
      at com.atlassian.clover.instr.java.Instrumenter.instrument(Instrumenter.java:166)
      at com.atlassian.clover.CloverInstr.execute(CloverInstr.java:76)
      at com.atlassian.clover.CloverInstr.mainImpl(CloverInstr.java:54)
      at ...
      

      Planned fix:

      • Ignore 200B characters in Java 7+. Treat 200B character as space in Java 6.
      • Question: Shall it be based on source level setting or the JDK detected?
      • Question: which other control characters needs to be ignored by Clover?
      • Question: which other whitespace characters (other than space, \t, \n, \r) shall be recognized by Clover parser?

      Workaround:

      Remove all 200B character occurrences from the source code.

            [CLOV-1835] Unicode 0x200B (zero-width whitespace) causes instrumentation failure

            Michael Andreacchio made changes -
            Resolution New: Won't Fix [ 2 ]
            Status Original: Open [ 1 ] New: Closed [ 6 ]
            jonah (Inactive) made changes -
            Symptom Severity New: Minor [ 14432 ]
            Owen made changes -
            Workflow Original: New Clover Workflow [ 983440 ] New: New Clover Workflow - Restricted [ 1474581 ]
            Grzegorz Lewandowski made changes -
            Remote Link New: This issue links to "Clover › All JDK Tests › CLOV-1835-zero-width-space-instrumentation-escape (devtools-bamboo)" [ 138842 ]
            Grzegorz Lewandowski made changes -
            Remote Link New: This issue links to "Clover › All Ant Groovy Tests › CLOV-1835-zero-width-space-instrumentation-escape (devtools-bamboo)" [ 138982 ]
            Grzegorz Lewandowski made changes -
            Remote Link New: This issue links to "Clover › Default › CLOV-1835-zero-width-space-instrumentation-escape (devtools-bamboo)" [ 138841 ]

            How javac (JDK7) treats control characters:

            public class A {
                // unicode characters are inserted after first word in each method:
            
                //void no BreakSpace00A0() {} - not allowed, compilation error
                void word⁠Joiner2060() {} // allowed
                void zeroWidthNoBreakSpaceFEFF() {} // allowed
                void zero​WidthSpace200B() {} // allowed
                void zero‌WidthNonJoiner200C() {} // allowed
                void zero‍WidthJointer200D() {} // allowed
                void soft­Hyphen00AD() {} // allowed
                // void hyphenation‧Point2027() {} - not allowed, compilation error
            
                // line 2028 and pararagraph 2029 separators
                // int
a=1; not allowed
                // int
b=2; not allowed
            }
            

            Marek Parfianowicz added a comment - How javac (JDK7) treats control characters: public class A { // unicode characters are inserted after first word in each method: //void no BreakSpace00A0() {} - not allowed, compilation error void word⁠Joiner2060() {} // allowed void zeroWidthNoBreakSpaceFEFF() {} // allowed void zero​WidthSpace200B() {} // allowed void zero‌WidthNonJoiner200C() {} // allowed void zero‍WidthJointer200D() {} // allowed void soft­Hyphen00AD() {} // allowed // void hyphenation‧Point2027() {} - not allowed, compilation error // line 2028 and pararagraph 2029 separators // int 
a=1; not allowed // int 
b=2; not allowed }
            Marek Parfianowicz made changes -
            Description Original: JDK 6 implements Unicode 4.0. In this version of Unicode, the zero-width whitespace character (0x200B) is being treated as a whitespace.

            JDK 7 implements Unicode 6.0. In this version of Unicode, the zero-width whitespace has been reclassified to the 'format character' group (other characters in this group are, for example, left-/right- text direction markers).

            Thus, Java compiler in JDK 6 allows to use 0x200B as a normal whitespace character, e.g. separating symbols.

            The Java compiler since JDK 7 silently ignores the 0x200B, which means that it cannot be used to separate symbols anymore. However you can put this character virtually in any place, e.g.:

            {code:java}
            void this<200B>Is<200B>MyMethod();
            {code}

            Clover fails on parsing the 200B character:

            {noformat}
            Xyz.java:287:90:unexpected char: 0x200B
            at com.atlassian.clover.instr.java.Instrumenter.instrument(Instrumenter.java:166)
            at com.atlassian.clover.CloverInstr.execute(CloverInstr.java:76)
            at com.atlassian.clover.CloverInstr.mainImpl(CloverInstr.java:54)
            at ...
            {noformat}

            *Planned fix:*

             * Ignore 200B characters in Java 7+. Treat 200B character as space in Java 6.
             * Question: Shall it be based on source level setting or the JDK detected?
             * Question: which other control characters needs to be ignored by Clover?
             * Question: which other whitespace characters (other than space, \t, \n, \r) shall be recognized by Clover parser?

            *Workaround:*

            Remove all 200B character from the source code.
            New: JDK 6 implements Unicode 4.0. In this version of Unicode, the zero-width whitespace character (0x200B) is being treated as a whitespace.

            JDK 7 implements Unicode 6.0. In this version of Unicode, the zero-width whitespace has been reclassified to the 'format character' group (other characters in this group are, for example, left-/right- text direction markers).

            Thus, Java compiler in JDK 6 allows to use 0x200B as a normal whitespace character, e.g. separating symbols.

            The Java compiler since JDK 7 silently ignores the 0x200B, which means that it cannot be used to separate symbols anymore. However you can put this character virtually in any place, e.g.:

            {code:java}
            void this<200B>Is<200B>MyMethod();
            {code}

            Clover fails on parsing the 200B character:

            {noformat}
            Xyz.java:287:90:unexpected char: 0x200B
            at com.atlassian.clover.instr.java.Instrumenter.instrument(Instrumenter.java:166)
            at com.atlassian.clover.CloverInstr.execute(CloverInstr.java:76)
            at com.atlassian.clover.CloverInstr.mainImpl(CloverInstr.java:54)
            at ...
            {noformat}

            *Planned fix:*

             * Ignore 200B characters in Java 7+. Treat 200B character as space in Java 6.
             * Question: Shall it be based on source level setting or the JDK detected?
             * Question: which other control characters needs to be ignored by Clover?
             * Question: which other whitespace characters (other than space, \t, \n, \r) shall be recognized by Clover parser?

            *Workaround:*

            Remove all 200B character occurrences from the source code.
            Marek Parfianowicz made changes -
            Description Original: JDK 6 implements Unicode 4.0. In this version of Unicode, the zero-width whitespace character (0x200B) is being treated as a whitespace.

            JDK 7 implements Unicode 6.0. In this version of Unicode, the zero-width whitespace has been reclassified to the 'format character' group (other characters in this group are, for example, left-/right- text direction markers).

            Thus, Java compiler in JDK 6 allows to use 0x200B as a normal whitespace character, e.g. separating symbols.

            The Java compiler since JDK 7 silently ignores the 0x200B, which means that it cannot be used to separate symbols anymore. However you can put this character virtually in any place, e.g.:

            {code:java}
            void this<200B>Is<200B>MyMethod();
            {code}

            Clover fails on parsing the 200B character:

            {noformat}
            Xyz.java:287:90:unexpected char: 0x200B
            at com.atlassian.clover.instr.java.Instrumenter.instrument(Instrumenter.java:166)
            at com.atlassian.clover.CloverInstr.execute(CloverInstr.java:76)
            at com.atlassian.clover.CloverInstr.mainImpl(CloverInstr.java:54)
            at ...
            {noformat}

            *Possible fix:*

            Ignore 200B characters in Java 7+. Treat 200B character as space in Java 6.

            Q: Shall it be based on source level setting or the JDK detected?

            *Workaround:*

            Remove all 200B character from the source code.
            New: JDK 6 implements Unicode 4.0. In this version of Unicode, the zero-width whitespace character (0x200B) is being treated as a whitespace.

            JDK 7 implements Unicode 6.0. In this version of Unicode, the zero-width whitespace has been reclassified to the 'format character' group (other characters in this group are, for example, left-/right- text direction markers).

            Thus, Java compiler in JDK 6 allows to use 0x200B as a normal whitespace character, e.g. separating symbols.

            The Java compiler since JDK 7 silently ignores the 0x200B, which means that it cannot be used to separate symbols anymore. However you can put this character virtually in any place, e.g.:

            {code:java}
            void this<200B>Is<200B>MyMethod();
            {code}

            Clover fails on parsing the 200B character:

            {noformat}
            Xyz.java:287:90:unexpected char: 0x200B
            at com.atlassian.clover.instr.java.Instrumenter.instrument(Instrumenter.java:166)
            at com.atlassian.clover.CloverInstr.execute(CloverInstr.java:76)
            at com.atlassian.clover.CloverInstr.mainImpl(CloverInstr.java:54)
            at ...
            {noformat}

            *Planned fix:*

             * Ignore 200B characters in Java 7+. Treat 200B character as space in Java 6.
             * Question: Shall it be based on source level setting or the JDK detected?
             * Question: which other control characters needs to be ignored by Clover?
             * Question: which other whitespace characters (other than space, \t, \n, \r) shall be recognized by Clover parser?

            *Workaround:*

            Remove all 200B character from the source code.
            Marek Parfianowicz made changes -
            Rank New: Ranked higher

              Unassigned Unassigned
              mparfianowicz Marek Parfianowicz
              Affected customers:
              0 This affects my team
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: