Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-8238

Header anchors do not work in Firefox with non-ASCII characters

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: High High
    • 2.5.1
    • 2.4.4
    • None
    • standalone

      We noticed that generating of header anchors was changed and our anchor links were broken.
      In v.2.2.9 was:
      <h3><a name="H3H4-UTF8notASCIIstring"></a>UTF8 not ASCII string</h3>
      The code works properly everywhere.

      But in v. 2.4.4 is:
      <h3 id="H3H4-UTF8notASCIIstring">UTF8 not ASCII string</h3>
      The code works properly in IE and Opera.
      Firefox can not work with anchors by an id, if the id contains not ASCII string (e.g. Russian).

      I can suppose it is not a Confluence bug, it is a Firefox problem.
      But it would be great if you could manage to find a workaround of the problem, e.g. rollback of generating of header anchors to v.2.2.9

            [CONFSERVER-8238] Header anchors do not work in Firefox with non-ASCII characters

            Hmm, so I guess the existing document fragments links are hard-coded that's why you can't change encoding now.

            But anyway reverting back is probably better given that it's always a little worrisome that those elements (like <h2>) are forced to have an id, which could interfere with other macros (if any) that also need to modify element ids.

            Mingyi Liu added a comment - Hmm, so I guess the existing document fragments links are hard-coded that's why you can't change encoding now. But anyway reverting back is probably better given that it's always a little worrisome that those elements (like <h2>) are forced to have an id, which could interfere with other macros (if any) that also need to modify element ids.

            Tom Davies added a comment -

            didn't make 2.5 cutoff

            Tom Davies added a comment - didn't make 2.5 cutoff

            Tom Davies added a comment -

            We have reverted to the original behaviour, i.e. a named anchor.

            Tom Davies added a comment - We have reverted to the original behaviour, i.e. a named anchor.

            It isn't Firefox's fault. It is simply following the standards to do with HTML ID character sets. We could workaround it by changing the encoding standard but that has ramifications for existing links with fragment identifiers. Pity that.

            Christopher Owen [Atlassian] added a comment - It isn't Firefox's fault. It is simply following the standards to do with HTML ID character sets. We could workaround it by changing the encoding standard but that has ramifications for existing links with fragment identifiers. Pity that.

            By the way, I understand the motivation for the original change - it's a pity FireFox doesn't play along, the 'id' arrangement is definitely much neater.

            David Peterson added a comment - By the way, I understand the motivation for the original change - it's a pity FireFox doesn't play along, the 'id' arrangement is definitely much neater.

            While you're fixing the problem, could you possibly change the anchor generation algorithm to not drop out instantly when it encounters a non-alphanumeric? If you nave a macro in a title (eg. ) all your anchors end up being the same name - ie 'PageName-'... Not terribly helpful.

            David Peterson added a comment - While you're fixing the problem, could you possibly change the anchor generation algorithm to not drop out instantly when it encounters a non-alphanumeric? If you nave a macro in a title (eg. ) all your anchors end up being the same name - ie 'PageName-'... Not terribly helpful.

            We have been discussing this internally here at Atlassian and it is likely now given the limited character set that may be placed in the id of an element that we will revert to using an empty anchor with a name. This is the only way we can proceed while preserving existing links to document fragments.

            Christopher Owen [Atlassian] added a comment - We have been discussing this internally here at Atlassian and it is likely now given the limited character set that may be placed in the id of an element that we will revert to using an empty anchor with a name. This is the only way we can proceed while preserving existing links to document fragments.

            Mingyi Liu added a comment -

            Hmm, spoke too soon. Seems it's still a Confluence bug. I checked further and found Firefox implementation actually does conform to what's described at http://www.w3.org/TR/html401/struct/links.html. Based on the allowed character set for HTML element IDs (http://www.w3.org/TR/html401/types.html#type-name), it only allows [A-Za-z0-9_.:-]. Firefox supports all of them. Additionally, ',', '%' etc. (in fact most of the printable characters), are supported in Firefox.

            So it's really not a firefox problem as it does conform to the standards.

            Upon further inspection, I found that the reason why some links do not work in Firefox is because of the escaping in the IDs. For example, the following situation would work in Firefox:

            <a href="#:">test</a>
            ...
            <h2 id=":">first heading</h2>

            So does:

            <a href="#%3A">test</a>
            ...
            <h2 id=":">first heading</h2>

            But not:

            <a href="#%3A">test</a>
            ...
            <h2 id="%3A">first heading</h2>

            This suggests that Firefox only unescapes the URI in <a> (correct behavior) but not ID (again, I believe it's the correct behavior too. Why should browsers unescape characters in IDs? Unicode argument does not apply here. In fact, it's surprising that other browsers would all be behaving incorrectly including Opera, as suggested by the other user).

            So here're my suggestions as to how one could address this problem:

            1. One could change the code in com.atlassian.confluence.renderer.NoAnchorHeadingBlockRenderer to escape the string in <a>, but do not escape them in <h2 id... etc. This, however, runs into the risk that some characters MUST be escaped. For example, "

            2. One could escape the code in both <a> and <h2 id... BUT get rid of the '%' after escaping the string. This way you're left with a unique, standards-conforming string for both link and ID and they'd work in all browsers.

            What's more, I do not understand the logic of escaping everything except for space character, which was just discarded. It seems to me if the space is not regarded as important for uniqueness, so are all the punctuations, which are all escaped by your renderer after removing space. So the method 2 above should be used for space character too and the result would have guaranteed uniqueness even for any situation. So instead of removing space then escape, the procedure should become (not remove space), escape, then remove % character.

            BTW, I also noticed that your renderer for some reason was using the deprecated ' in <a href='uri'> instead of <a href="uri">. " should be used instead of '.

            Mingyi Liu added a comment - Hmm, spoke too soon. Seems it's still a Confluence bug. I checked further and found Firefox implementation actually does conform to what's described at http://www.w3.org/TR/html401/struct/links.html . Based on the allowed character set for HTML element IDs ( http://www.w3.org/TR/html401/types.html#type-name ), it only allows [A-Za-z0-9_.:-] . Firefox supports all of them. Additionally, ',', '%' etc. (in fact most of the printable characters), are supported in Firefox. So it's really not a firefox problem as it does conform to the standards. Upon further inspection, I found that the reason why some links do not work in Firefox is because of the escaping in the IDs. For example, the following situation would work in Firefox: <a href="#:">test</a> ... <h2 id=":">first heading</h2> So does: <a href="#%3A">test</a> ... <h2 id=":">first heading</h2> But not: <a href="#%3A">test</a> ... <h2 id="%3A">first heading</h2> This suggests that Firefox only unescapes the URI in <a> (correct behavior) but not ID (again, I believe it's the correct behavior too. Why should browsers unescape characters in IDs? Unicode argument does not apply here. In fact, it's surprising that other browsers would all be behaving incorrectly including Opera, as suggested by the other user). So here're my suggestions as to how one could address this problem: 1. One could change the code in com.atlassian.confluence.renderer.NoAnchorHeadingBlockRenderer to escape the string in <a>, but do not escape them in <h2 id... etc. This, however, runs into the risk that some characters MUST be escaped. For example, " 2. One could escape the code in both <a> and <h2 id... BUT get rid of the '%' after escaping the string. This way you're left with a unique, standards-conforming string for both link and ID and they'd work in all browsers. What's more, I do not understand the logic of escaping everything except for space character, which was just discarded. It seems to me if the space is not regarded as important for uniqueness, so are all the punctuations, which are all escaped by your renderer after removing space. So the method 2 above should be used for space character too and the result would have guaranteed uniqueness even for any situation. So instead of removing space then escape, the procedure should become (not remove space), escape, then remove % character. BTW, I also noticed that your renderer for some reason was using the deprecated ' in <a href='uri'> instead of <a href="uri">. " should be used instead of '.

            Mingyi Liu added a comment -

            More and more people in my company are taking up firefox, so this is becoming a big issue for us. Based on what's described here (http://www.w3.org/TR/html401/struct/links.html), what you guys did were the correct thing. Ideally, Firefox should fix their bug. I'll file a bug report there, but based on my experience with an Ajax bug Mozilla family has, it could take years before it gets fixed. In the meantime, I hope you guys could find a better way.

            Mingyi Liu added a comment - More and more people in my company are taking up firefox, so this is becoming a big issue for us. Based on what's described here ( http://www.w3.org/TR/html401/struct/links.html ), what you guys did were the correct thing. Ideally, Firefox should fix their bug. I'll file a bug report there, but based on my experience with an Ajax bug Mozilla family has, it could take years before it gets fixed. In the meantime, I hope you guys could find a better way.

            This is apparently also a problem with commas, question marks and other non-alphanumeric characters.

            David Peterson added a comment - This is apparently also a problem with commas, question marks and other non-alphanumeric characters.

              christopher.owen@atlassian.com Christopher Owen [Atlassian]
              108d9567b6a4 Sergey Zakharov
              Affected customers:
              2 This affects my team
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: