Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-9392

StackOverflowError in ConfluenceLinkResolver.extractLinkTextList

      When a really long complex url is present in a wiki page, Confluence throws java.lang.StackOverflowError after the page is submitted.

      The exception is thrown while the URLs are being extracted from the page content via ConfluenceLinkResolver.extractLinkTextList().

      I was able to identify the problem as a JDK bug 6337993.

      Until the JDK bug is fixed there are two ways to resolve this issue in Confluence.

      The Xss Approach

      By increasing the stack size via -Xss option let's say:

      -Xss512k
      

      The optimal stack size is platform and JVM dependent, so some research/consulting needs to be done when changing this value. Increasing the stack size makes is possible to resolve longer URLs, but it works only until the next limit is hit.

      Regex Pattern Approach

      The Confluence code can be modified to prevent the issue from occurring by simplifying the URL pattern.

      Instead of (confluence/confluence/src/java/com/atlassian/confluence/renderer/radeox/filters/UrlFilter.java)

      PURE_URL_PATTERN = "((" + protocols + ")(%[\\p{Digit}A-Fa-f][\\p{Digit}A-Fa-f]|[-_.!~*';/?:@#&=+$,\\p{Alnum}\\[\\]\\\\])+)";
      

      you could use

      PURE_URL_PATTERN = "((" + protocols + ")\\S+";
      

      or some similar, simple pattern.

      The downside of this approach is that the matching is not as strict as it used to be which might or might not break something else - Atlassian guys, you should be able to determine this.

      I'm attaching a simple java app that I wrote to simulate what happens inside Confluence when this error occurs. When you uncomment the second pattern the exception is not thrown and the url is matched.

      A stack trace captured when the exception was thrown is attached as well.

        1. LongUrl.java
          1 kB
          Igor Minar
        2. LongUrlStackTrace.log
          47 kB
          Igor Minar
        3. url-stackoverflow-fix.patch
          1 kB
          Igor Minar

            [CONFSERVER-9392] StackOverflowError in ConfluenceLinkResolver.extractLinkTextList

            I realized that I forgot to post the patch. Doh!

            Here you.. a simple patch that worksaround the problem is attached. It's not a ideal solution, missing a few links in the page info view is better than not being able to save a page at all.

            Igor Minar added a comment - I realized that I forgot to post the patch. Doh! Here you.. a simple patch that worksaround the problem is attached. It's not a ideal solution, missing a few links in the page info view is better than not being able to save a page at all.

            Don Willis added a comment -

            Hi Igor,

            I'll be sure to check it out.
            We are always thrilled to receive patches on issues.

            Cheers,
            Don

            Don Willis added a comment - Hi Igor, I'll be sure to check it out. We are always thrilled to receive patches on issues. Cheers, Don

            Igor Minar added a comment -

            Hi Don,

            Actually you don't need to worry about this unless you want to provide a workaround for other users.

            One of the many urls that was giving us a hard time is this one:

            http://wikis.sun.com/display/CommSuite/Communications+Suite+6+Component+Products+Release+Notes#CommunicationsSuite6ComponentProductsReleaseNotes-RequirementsforC6
            

            I created a patch that implements a workaround by catching the StackOverflowError. More info on my blog

            We've got a nice collection of patches already and will be soon evaluating, which ones are suitable for contribution to Atlassian. I expect the patch for this issue to be among them.

            cheers,
            Igor

            Igor Minar added a comment - Hi Don, Actually you don't need to worry about this unless you want to provide a workaround for other users. One of the many urls that was giving us a hard time is this one: http://wikis.sun.com/display/CommSuite/Communications+Suite+6+Component+Products+Release+Notes#CommunicationsSuite6ComponentProductsReleaseNotes-RequirementsforC6 I created a patch that implements a workaround by catching the StackOverflowError. More info on my blog We've got a nice collection of patches already and will be soon evaluating, which ones are suitable for contribution to Atlassian. I expect the patch for this issue to be among them. cheers, Igor

            Don Willis added a comment -

            Hi Igor,

            Reopening is certainly a possibility, although possibly opening a new bug with an extra "really" in the description might be better, since I assume Matt's fix has decreased the number of URLS that cause this problem. That way the fix trail of which versions the improvements happened in is clearer.

            Have you tried increasing the -Xss parameter to the JVM? What's it currently set to?
            Do you have a new example URL we could use as a pathalogical test case?

            Cheers,
            Don

            Don Willis added a comment - Hi Igor, Reopening is certainly a possibility, although possibly opening a new bug with an extra "really" in the description might be better, since I assume Matt's fix has decreased the number of URLS that cause this problem. That way the fix trail of which versions the improvements happened in is clearer. Have you tried increasing the -Xss parameter to the JVM? What's it currently set to? Do you have a new example URL we could use as a pathalogical test case? Cheers, Don

            Igor Minar added a comment -

            Hi Matt,

            We are still seeing this problem and as our content grows, the error occurs more and more frequently. Is it possible to reopen the issue and have the solution reevaluated?

            cheers,
            Igor

            Igor Minar added a comment - Hi Matt, We are still seeing this problem and as our content grows, the error occurs more and more frequently. Is it possible to reopen the issue and have the solution reevaluated? cheers, Igor

            Matt Ryall added a comment - - edited

            Fixed for Confluence 2.6.1.

            Matt Ryall added a comment - - edited Fixed for Confluence 2.6.1.

            Matt Ryall added a comment -

            Thanks, Igor.

            I think there's suitable fix for this that doesn't degrade the quality of the regex match much, but still avoids the quantified alternation which causes the stack overflow. The following works in your test program:

            String URL_PATTERN = "([^\"\\[\\|'!]|^)((http://|https://|ftp://|ftps://|mailto:|nntp://|news://|irc://|file:)([-_.!~*';/?:%@#&=+$,\\p{Alnum}\\[\\]\\\\])+)";
            

            It merely allows URLs to include percentages which aren't part of a valid URL-encoded character (i.e. it would match "http://www.example%.com", which doesn't currently match). I think it's a suitable trade-off.

            Matt Ryall added a comment - Thanks, Igor. I think there's suitable fix for this that doesn't degrade the quality of the regex match much, but still avoids the quantified alternation which causes the stack overflow. The following works in your test program: String URL_PATTERN = "([^\" \\[\\| '!]|^)((http: //|https://|ftp://|ftps://|mailto:|nntp://|news://|irc://|file:)([-_.!~*' ;/?:%@#&=+$,\\p{Alnum}\\[\\]\\\\])+)"; It merely allows URLs to include percentages which aren't part of a valid URL-encoded character (i.e. it would match "http://www.example%.com", which doesn't currently match). I think it's a suitable trade-off.

            Igor Minar added a comment -

            Hi Matt,

            I'm sorry for the delayed reply.

            Here is a real world URL that fails for me at wikis.sun.com

            http://maps.google.com/maps?f=d&hl=en&view=map&geocode=&time=&date=&ttype=&saddr=Highway+50+%26+Upper+T,+South+Lake+Tahoe,+california,+united+states&daddr=4200+Oak+Grove+Drive,+Santa+Clara,+California,+united+states&sll=38.406254,-121.140747&sspn=2.272765,3.790283&ie=UTF8&ll=38.367502,-121.140747&spn=2.273983,3.790283&z=8&om=1

            Every time I tried to insert this url in a test page in our sandbox space I got StackOverflowError. Considering that we already bumped up the Xss size, this starts to be really annoying issue that we can't resolve without a change in the code.

            Also if you look at: http://wikis.sun.com/display/FreeWiFi/Free+Wi-Fi+Space+on+the+Road, you'll find:

            Unable to render content due to system error: null

            in the comments. This error was caused by SOE when rendering a page with another url:

            http://www.google.com/maps?f=q&hl=en&geocode=&q=panera+bread,+santa+fe,+littleton,+co&ie=UTF8&ll=39.651698,-105.014191&spn=0.194552,0.452499&z=11&iwloc=A&om=1

            In the description I mentioned that the error is caused by long urls, but that is not precise. The error is caused by complex URLs - urls that contain special characters (e.g. &, =, +, -, ?, /, etc) that are matched by the regular expression.

            Is there any chance that you can simplify the url regex?

            Igor Minar added a comment - Hi Matt, I'm sorry for the delayed reply. Here is a real world URL that fails for me at wikis.sun.com http://maps.google.com/maps?f=d&hl=en&view=map&geocode=&time=&date=&ttype=&saddr=Highway+50+%26+Upper+T,+South+Lake+Tahoe,+california,+united+states&daddr=4200+Oak+Grove+Drive,+Santa+Clara,+California,+united+states&sll=38.406254,-121.140747&sspn=2.272765,3.790283&ie=UTF8&ll=38.367502,-121.140747&spn=2.273983,3.790283&z=8&om=1 Every time I tried to insert this url in a test page in our sandbox space I got StackOverflowError. Considering that we already bumped up the Xss size, this starts to be really annoying issue that we can't resolve without a change in the code. Also if you look at: http://wikis.sun.com/display/FreeWiFi/Free+Wi-Fi+Space+on+the+Road , you'll find: Unable to render content due to system error: null in the comments. This error was caused by SOE when rendering a page with another url: http://www.google.com/maps?f=q&hl=en&geocode=&q=panera+bread,+santa+fe,+littleton,+co&ie=UTF8&ll=39.651698,-105.014191&spn=0.194552,0.452499&z=11&iwloc=A&om=1 In the description I mentioned that the error is caused by long urls, but that is not precise. The error is caused by complex URLs - urls that contain special characters (e.g. &, =, +, -, ?, /, etc) that are matched by the regular expression. Is there any chance that you can simplify the url regex?

            Igor Minar added a comment -

            Yeah, I had a feeling that such a simple pattern might not work, but since we now what exactly is the problem we can find a solution, right?

            The problem occurs approximately once a day in production. I'll try to isolate some urls from the real world that are causing the problem.

            Also keep in mind that the short program I wrote to simulate the problem doesn't take into consideration the fact that a lot of the stack is already used by all the calls that occurs in the container while processing a request. That means that in real world you need much shorter url, than the one in the example program, to get the exception.

            Igor Minar added a comment - Yeah, I had a feeling that such a simple pattern might not work, but since we now what exactly is the problem we can find a solution, right? The problem occurs approximately once a day in production. I'll try to isolate some urls from the real world that are causing the problem. Also keep in mind that the short program I wrote to simulate the problem doesn't take into consideration the fact that a lot of the stack is already used by all the calls that occurs in the container while processing a request. That means that in real world you need much shorter url, than the one in the example program, to get the exception.

            Matt Ryall added a comment -

            Hi Igor,

            Thanks for the detailed bug report.

            Unfortunately, changing the URL matching regex is not something we can do so simply. It affects the rendering of all URLs in Confluence. URLs are not simply a protocol followed by non-space character, they have a fixed number of valid characters which this regular expression tries to match. Your suggestion would make many non-URLs match where they didn't previously.

            I think a better solution would be to limit the size of the links which we parse using this regex.

            How large is a "really long" URL? Does this happen during normal operations, or only with your security testing tools?

            Regards,
            Matt

            Matt Ryall added a comment - Hi Igor, Thanks for the detailed bug report. Unfortunately, changing the URL matching regex is not something we can do so simply. It affects the rendering of all URLs in Confluence. URLs are not simply a protocol followed by non-space character, they have a fixed number of valid characters which this regular expression tries to match. Your suggestion would make many non-URLs match where they didn't previously. I think a better solution would be to limit the size of the links which we parse using this regex. How large is a "really long" URL? Does this happen during normal operations, or only with your security testing tools? Regards, Matt

              Unassigned Unassigned
              15d9a6950818 Igor Minar
              Affected customers:
              0 This affects my team
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: