Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-45966

Links containing umlaut or cyrillic characters become invalid

      Summary

      When you insert a link that contains umlaut or cyrillic characters via Insert more content > Link > Web link or the shortcuts ⌘+K / CTRL+K, the link changes so that Confluence encodes the percentage (%) symbol as %25. This renders the link to be unusable.

      For example:

      1. Correct link inserted into a Confluence page: https://de.wikipedia.org/wiki/Die_L%C3%B6wen
      2. Incorrect link after saving the page and clicking on the link: https://de.wikipedia.org/wiki/Die_L%25C3%25B6wen

      You can see the % symbol in the original link has been replaced as %25.

      Steps to Reproduce

      Option 1

      1. Create a new page and insert a web link via one of the methods below:
        • Insert more content > Link > Web link
        • ⌘+K / CTRL+K
      2. Insert this URL into the 'Address' field: https://de.wikipedia.org/wiki/Die_L%C3%B6wen
      3. Insert any text into the the 'Link text' field.
      4. Click on "Insert".
      5. Save the page.
      6. Try to open the link.

      Option 2

      1. Create a new page and paste a web link
      2. Edit the link and see that the 'Address' and 'Link text' field have not changed
      3. Change the 'Link text' field to new text
      4. Save the link
      5. Edit the link again
      6. The URL in the 'Address' field has now been changed

      Expected Results

      The link on the page should direct you to the correct page: https://de.wikipedia.org/wiki/Die_L%C3%B6wen

      Actual Results

      The link directs you to an invalid page: https://de.wikipedia.org/wiki/Die_L%25C3%25B6wen

      Notes

      Tested in Confluence 5.9.14. The functionality works as expected

      Workaround

      Option 1: Replace all % signs in links with spaces instead

      1. Take: https://de.wikipedia.org/wiki/Die_L%C3%B6wen URL
      2. Change to: https://de.wikipedia.org/wiki/Die_L C3 B6wen

      Option 1: Paste the link into the editor instead of using insert link dialog.

      • In this method you cannot modify the link text, so the link text will be automatically set based on the url

      Option 2: Use square brackets to insert the link. [ ]

      1. Type "["
      2. Type the text you want for your link in the editor, for example "My Link"
      3. Type a "|"
      4. Type the url
      5. Type "]" to complete the link.
      6. Save or preview the page to confirm it works.
      7. Important! Don't open the link dialog for this link. If you do, the link will be double-encoded.

      For step 4, you could also try pasting the url but this might have less success since the url will be converted to a link automatically and you might have to first undo that automatic conversion, which in turn may cause the closing "]" to have no effect. See option 3 instead.

      Option 3

      1. Paste the URL into the editor.
      2. Press undo to undo any automatic link creation/extraction so you see the original URL.
      3. If the URL is still a link, click it, and click "Unlink" to return it back to regular text.
      4. At the beginning of the link, type [ followed by `Escape` to exit the Link Autocomplete
      5. Type the text to display for the link followed by "|"
      6. Now your link should look like this
      7. Click at the end of the link and type "]" to complete the link. This should auto-convert it to a link with the desired text.
      8. Save or preview the page to confirm it works.
      9. Important! Don't open the link dialog for this link. If you do, the link will be double-encoded.

       

      Option 4: Using the Source Editor to change links after adding them

      1. Install the Confluence Source Editor (free add-on)
      2. Edit the page
      3. Click on the <> icon in the top right
      4. Find the href= attribute that contains the '%25' in the URL
      5. Replace it with just '%'
      6. Apply the changes and save the page

            [CONFSERVER-45966] Links containing umlaut or cyrillic characters become invalid

            Can we have this fix as a patch for version 6.0.8 ?

             

            Mostafa Gamil added a comment - Can we have this fix as a patch for version 6.0.8 ?  

            Hi martin.boehme. I don't think CQL indexes the attributes within links so it shouldn't be possible to query via CQL. However, you could write some SQL as follows (following only works with PostgreSQL):

            select c.contentid, s.spacename, c.title, c.contenttype
            from bodycontent bc
            inner join content c on c.contentid = bc.contentid
            inner join spaces s on c.spaceid = s.spaceid
            where bc.body ~* '<a href="[^"]+%25[0-9a-f]{2}'
            and c.lastmoddate > '01-10-2016'
            and c.content_status = 'current'
            and c.contenttype != 'ATTACHMENT'
            and s.spacestatus = 'CURRENT'

            That will return IDs, space name, titles and the content type for all content where the body has double escaped links. Note that some products/tools generate URLs that are meant to bea double escaped, so this can return false positives. The query should also run on MySQL if `~*` is changed to "REGEXP" Similar variants may exist for other databases.

            It can also take a long time to run against a large database, so I wouldn't recommend running it on a busy production instance.

            Hope this helps!

            Niraj Bhawnani added a comment - Hi martin.boehme . I don't think CQL indexes the attributes within links so it shouldn't be possible to query via CQL. However, you could write some SQL as follows (following only works with PostgreSQL): select c.contentid, s.spacename, c.title, c.contenttype from bodycontent bc inner join content c on c.contentid = bc.contentid inner join spaces s on c.spaceid = s.spaceid where bc.body ~* '<a href= "[^" ]+%25[0-9a-f]{2}' and c.lastmoddate > '01-10-2016' and c.content_status = 'current' and c.contenttype != 'ATTACHMENT' and s.spacestatus = 'CURRENT' That will return IDs, space name, titles and the content type for all content where the body has double escaped links. Note that some products/tools generate URLs that are meant to bea double escaped, so this can return false positives. The query should also run on MySQL if `~*` is changed to "REGEXP" Similar variants may exist for other databases. It can also take a long time to run against a large database, so I wouldn't recommend running it on a busy production instance. Hope this helps!

            Okay, so in 6.2.1 the Bug will be fixed.

            But what happens to all the corrupted links that have already been saved? How can I correct them without having to go manually through each page, open every link, see if it works and correct it?

            Does someone know how to make a CQL search query which finds all pages with corrupted links?

            Martin Boehme added a comment - Okay, so in 6.2.1 the Bug will be fixed. But what happens to all the corrupted links that have already been saved? How can I correct them without having to go manually through each page, open every link, see if it works and correct it? Does someone know how to make a CQL search query which finds all pages with corrupted links?

            Minh Tran added a comment -

            ashaleev There is no scheduled plan for version for 6.1.x. So i will remove 6.1.3 from the fixVersions
            cc alyakovlev

            Minh Tran added a comment - ashaleev There is no scheduled plan for version for 6.1.x. So i will remove 6.1.3 from the fixVersions cc alyakovlev

            Anton Shaleev added a comment - The fix for 6.2.1 works correctly ; The fix for 6.1.3 corrupts the links: Steps to Reproduce Insert the link containing special symbols to Confluence 6.1.3 page: https://demo.sharepoint.com/_layouts/15/guestaccess.aspx?guestaccesstoken=RRlghWG8CEbBW3QWuwXTeYmn1sHxH9pWe5KnprpGnEs%3d&folderid=2_1c6a8e4645679462286663f7f48a2819c&rev=1 Publish/Update the page Expected Results The link points to the page: https://demo.sharepoint.com/_layouts/15/guestaccess.aspx?guestaccesstoken=RRlghWG8CEbBW3QWuwXTeYmn1sHxH9pWe5KnprpGnEs%3d&folderid=2_1c6a8e4645679462286663f7f48a2819c&rev=1 Actual Results The link points to the page: {{ https://demo.sharepoint.com/_layouts/15/guestaccess.aspx?guestaccesstoken=RRlghWG8CEbBW3QWuwXTeYmn1sHxH9pWe5KnprpGnEs%253d&folderid=2_1c6a8e4645679462286663f7f48a2819c&rev=1

            A new fix for this issue is now available for Confluence Server customers. The updated fix is part of CONFSERVER-52420 and takes care of the cases not covered here. For your information, we now only encode those characters which are known to be problematic in urls such as ', [, ], \, and `, and do not encode other characters. In this way we won't encode %, so any characters in the URL that are already encoded such as %20 or %2f will not be double-encoded anymore.

            You can upgrade here, or check the release notes.

            Please note that any existing links that are in the broken state will need to be fixed manually. I apologise for this inconvenience.

            Alex Yakovlev (Inactive) added a comment - A new fix for this issue is now available for Confluence Server customers. The updated fix is part of  CONFSERVER-52420  and takes care of the cases not covered here. For your information, we now only encode those characters which are known to be problematic in urls such as ', [, ] , \, and `, and do not encode other characters. In this way we won't encode %, so any characters in the URL that are already encoded such as %20 or %2f will not be double-encoded anymore. You can upgrade here , or check the release notes . Please note that any existing links that are in the broken state will need to be fixed manually. I apologise for this inconvenience.

            All, I opened a new ticket which I am working on now specifically about the doubly-encoded special characters "%3B %2C %2F %3F %3A %40 %26 %3D %2B %24": https://jira.atlassian.com/browse/CONFSERVER-52420

            I will update this ticket as well once that one is resolved. Please be advised that that should completely fix any double-encoding issue in Confluence links.

            In regards to being unable to save urls to "file:", this is a known but separate issue, and can be tracked here: https://jira.atlassian.com/browse/CONFSERVER-46039

            Alex Yakovlev (Inactive) added a comment - All, I opened a new ticket which I am working on now specifically about the doubly-encoded special characters "%3B %2C %2F %3F %3A %40 %26 %3D %2B %24": https://jira.atlassian.com/browse/CONFSERVER-52420 I will update this ticket as well once that one is resolved. Please be advised that that should completely fix any double-encoding issue in Confluence links. In regards to being unable to save urls to "file:", this is a known but separate issue, and can be tracked here:  https://jira.atlassian.com/browse/CONFSERVER-46039

            I would rather have a bugfix than a workaround.

            Gemma Haenen added a comment - I would rather have a bugfix than a workaround.

            The source editor is an Unsupported plugin !!

            Gemma Haenen added a comment - The source editor is an Unsupported plugin !!

            aozerov2002378919 added a comment -

            Hi, we have problems with link, wich contains cirillic simbols.

            Confluence change it to url-code, but links not working. And link to local files not working, if we add it (Ctrl+k)

            After save page, links become just text.

            In 6.1.1 and 6.1.3 server version confluence

             

            We can to use only old metod - write [ label | link ] . But if we try to change it, this link stopt working.

             

            aozerov2002378919 added a comment - Hi, we have problems with link, wich contains cirillic simbols. Confluence change it to url-code, but links not working. And link to local files not working, if we add it (Ctrl+k) After save page, links become just text. In 6.1.1 and 6.1.3 server version confluence   We can to use only old metod - write [ label | link ] . But if we try to change it, this link stopt working.  

              qpham@atlassian.com Quan Pham
              azolkefli Athirah Zolkefli
              Affected customers:
              55 This affects my team
              Watchers:
              81 Start watching this issue

                Created:
                Updated:
                Resolved: