|
Yeah, I had a feeling that such a simple pattern might not work, but since we now what exactly is the problem we can find a solution, right?
The problem occurs approximately once a day in production. I'll try to isolate some urls from the real world that are causing the problem. Also keep in mind that the short program I wrote to simulate the problem doesn't take into consideration the fact that a lot of the stack is already used by all the calls that occurs in the container while processing a request. That means that in real world you need much shorter url, than the one in the example program, to get the exception. Hi Matt,
I'm sorry for the delayed reply. Here is a real world URL that fails for me at wikis.sun.com Every time I tried to insert this url in a test page in our sandbox space I got StackOverflowError. Considering that we already bumped up the Xss size, this starts to be really annoying issue that we can't resolve without a change in the code. Also if you look at: http://wikis.sun.com/display/FreeWiFi/Free+Wi-Fi+Space+on+the+Road
in the comments. This error was caused by SOE when rendering a page with another url: In the description I mentioned that the error is caused by long urls, but that is not precise. The error is caused by complex URLs - urls that contain special characters (e.g. &, =, +, -, ?, /, etc) that are matched by the regular expression. Is there any chance that you can simplify the url regex? Thanks, Igor.
I think there's suitable fix for this that doesn't degrade the quality of the regex match much, but still avoids the quantified alternation which causes the stack overflow. The following works in your test program: String URL_PATTERN = "([^\"\\[\\|'!]|^)((http://|https://|ftp://|ftps://|mailto:|nntp://|news://|irc://|file:)([-_.!~*';/?:%@#&=+$,\\p{Alnum}\\[\\]\\\\])+)"; It merely allows URLs to include percentages which aren't part of a valid URL-encoded character (i.e. it would match "http://www.example%.com", which doesn't currently match). I think it's a suitable trade-off. Fixed for Confluence 2.6.1.
Hi Matt,
We are still seeing this problem and as our content grows, the error occurs more and more frequently. Is it possible to reopen the issue and have the solution reevaluated? cheers, Hi Igor,
Reopening is certainly a possibility, although possibly opening a new bug with an extra "really" in the description might be better, since I assume Matt's fix has decreased the number of URLS that cause this problem. That way the fix trail of which versions the improvements happened in is clearer. Have you tried increasing the -Xss parameter to the JVM? What's it currently set to? Cheers, Hi Don,
Actually you don't need to worry about this unless you want to provide a workaround for other users. One of the many urls that was giving us a hard time is this one: http://wikis.sun.com/display/CommSuite/Communications+Suite+6+Component+Products+Release+Notes#CommunicationsSuite6ComponentProductsReleaseNotes-RequirementsforC6 I created a patch that implements a workaround by catching the StackOverflowError. More info on my blog We've got a nice collection of patches already and will be soon evaluating, which ones are suitable for contribution to Atlassian. I expect the patch for this issue to be among them. cheers, Hi Igor,
I'll be sure to check it out. Cheers, |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Thanks for the detailed bug report.
Unfortunately, changing the URL matching regex is not something we can do so simply. It affects the rendering of all URLs in Confluence. URLs are not simply a protocol followed by non-space character, they have a fixed number of valid characters which this regular expression tries to match. Your suggestion would make many non-URLs match where they didn't previously.
I think a better solution would be to limit the size of the links which we parse using this regex.
How large is a "really long" URL? Does this happen during normal operations, or only with your security testing tools?
Regards,
Matt