New and Improved 3.13 Beta. Highlights: Shareable filters and dashboards and lots of other goodies. Any feedback can be raised as JIRA issues in the JIRA project.
Issue Details (XML | Word | Printable)

Key: CONF-9392
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Unassigned
Reporter: Igor Minar
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Confluence

StackOverflowError in ConfluenceLinkResolver.extractLinkTextList

Created: 05/Sep/07 11:39 PM   Updated: 05/Jun/08 09:15 PM
Component/s: Linking, Pages, Renderer, Usability, WIKI / HTML
Affects Version/s: 2.5.6, 2.5.7, 2.5.8
Fix Version/s: 2.6.1

Time Tracking:
Not Specified

File Attachments: 1. Java Source File LongUrl.java (1 kB)
2. Text File LongUrlStackTrace.log (47 kB)

Environment: JDK5 and JDK6, Confluence war, MySql 4

Participants: Don Willis [Atlassian], Igor Minar and Matt Ryall [Atlassian]
Since last comment: 12 weeks ago
Internal Complexity: 2
Resolution Date: 18/Oct/07 07:48 PM
Internal Value: 4
Labels:


 Description  « Hide
When a really long complex url is present in a wiki page, Confluence throws java.lang.StackOverflowError after the page is submitted.

The exception is thrown while the URLs are being extracted from the page content via ConfluenceLinkResolver.extractLinkTextList().

I was able to identify the problem as a JDK bug 6337993.

Until the JDK bug is fixed there are two ways to resolve this issue in Confluence.

The Xss Approach

By increasing the stack size via -Xss option let's say:

-Xss512k

The optimal stack size is platform and JVM dependent, so some research/consulting needs to be done when changing this value. Increasing the stack size makes is possible to resolve longer URLs, but it works only until the next limit is hit.

Regex Pattern Approach

The Confluence code can be modified to prevent the issue from occurring by simplifying the URL pattern.

Instead of (confluence/confluence/src/java/com/atlassian/confluence/renderer/radeox/filters/UrlFilter.java)

PURE_URL_PATTERN = "((" + protocols + ")(%[\\p{Digit}A-Fa-f][\\p{Digit}A-Fa-f]|[-_.!~*';/?:@#&=+$,\\p{Alnum}\\[\\]\\\\])+)";

you could use

PURE_URL_PATTERN = "((" + protocols + ")\\S+";

or some similar, simple pattern.

The downside of this approach is that the matching is not as strict as it used to be which might or might not break something else - Atlassian guys, you should be able to determine this.

I'm attaching a simple java app that I wrote to simulate what happens inside Confluence when this error occurs. When you uncomment the second pattern the exception is not thrown and the url is matched.

A stack trace captured when the exception was thrown is attached as well.



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Matt Ryall [Atlassian] added a comment - 13/Sep/07 02:03 AM
Hi Igor,

Thanks for the detailed bug report.

Unfortunately, changing the URL matching regex is not something we can do so simply. It affects the rendering of all URLs in Confluence. URLs are not simply a protocol followed by non-space character, they have a fixed number of valid characters which this regular expression tries to match. Your suggestion would make many non-URLs match where they didn't previously.

I think a better solution would be to limit the size of the links which we parse using this regex.

How large is a "really long" URL? Does this happen during normal operations, or only with your security testing tools?

Regards,
Matt


Igor Minar added a comment - 13/Sep/07 10:26 AM
Yeah, I had a feeling that such a simple pattern might not work, but since we now what exactly is the problem we can find a solution, right?

The problem occurs approximately once a day in production. I'll try to isolate some urls from the real world that are causing the problem.

Also keep in mind that the short program I wrote to simulate the problem doesn't take into consideration the fact that a lot of the stack is already used by all the calls that occurs in the container while processing a request. That means that in real world you need much shorter url, than the one in the example program, to get the exception.


Igor Minar added a comment - 12/Oct/07 03:41 PM
Hi Matt,

I'm sorry for the delayed reply.

Here is a real world URL that fails for me at wikis.sun.com

http://maps.google.com/maps?f=d&hl=en&view=map&geocode=&time=&date=&ttype=&saddr=Highway+50+%26+Upper+T,+South+Lake+Tahoe,+california,+united+states&daddr=4200+Oak+Grove+Drive,+Santa+Clara,+California,+united+states&sll=38.406254,-121.140747&sspn=2.272765,3.790283&ie=UTF8&ll=38.367502,-121.140747&spn=2.273983,3.790283&z=8&om=1

Every time I tried to insert this url in a test page in our sandbox space I got StackOverflowError. Considering that we already bumped up the Xss size, this starts to be really annoying issue that we can't resolve without a change in the code.

Also if you look at: http://wikis.sun.com/display/FreeWiFi/Free+Wi-Fi+Space+on+the+Road, you'll find:

Unable to render content due to system error: null

in the comments. This error was caused by SOE when rendering a page with another url:

http://www.google.com/maps?f=q&hl=en&geocode=&q=panera+bread,+santa+fe,+littleton,+co&ie=UTF8&ll=39.651698,-105.014191&spn=0.194552,0.452499&z=11&iwloc=A&om=1

In the description I mentioned that the error is caused by long urls, but that is not precise. The error is caused by complex URLs - urls that contain special characters (e.g. &, =, +, -, ?, /, etc) that are matched by the regular expression.

Is there any chance that you can simplify the url regex?


Matt Ryall [Atlassian] added a comment - 15/Oct/07 01:18 AM
Thanks, Igor.

I think there's suitable fix for this that doesn't degrade the quality of the regex match much, but still avoids the quantified alternation which causes the stack overflow. The following works in your test program:

String URL_PATTERN = "([^\"\\[\\|'!]|^)((http://|https://|ftp://|ftps://|mailto:|nntp://|news://|irc://|file:)([-_.!~*';/?:%@#&=+$,\\p{Alnum}\\[\\]\\\\])+)";

It merely allows URLs to include percentages which aren't part of a valid URL-encoded character (i.e. it would match "http://www.example%.com", which doesn't currently match). I think it's a suitable trade-off.


Matt Ryall [Atlassian] added a comment - 18/Oct/07 07:48 PM - edited
Fixed for Confluence 2.6.1.

Igor Minar added a comment - 16/May/08 04:02 PM
Hi Matt,

We are still seeing this problem and as our content grows, the error occurs more and more frequently. Is it possible to reopen the issue and have the solution reevaluated?

cheers,
Igor


Don Willis [Atlassian] added a comment - 05/Jun/08 01:37 AM
Hi Igor,

Reopening is certainly a possibility, although possibly opening a new bug with an extra "really" in the description might be better, since I assume Matt's fix has decreased the number of URLS that cause this problem. That way the fix trail of which versions the improvements happened in is clearer.

Have you tried increasing the -Xss parameter to the JVM? What's it currently set to?
Do you have a new example URL we could use as a pathalogical test case?

Cheers,
Don


Igor Minar added a comment - 05/Jun/08 10:22 AM
Hi Don,

Actually you don't need to worry about this unless you want to provide a workaround for other users.

One of the many urls that was giving us a hard time is this one:

http://wikis.sun.com/display/CommSuite/Communications+Suite+6+Component+Products+Release+Notes#CommunicationsSuite6ComponentProductsReleaseNotes-RequirementsforC6

I created a patch that implements a workaround by catching the StackOverflowError. More info on my blog

We've got a nice collection of patches already and will be soon evaluating, which ones are suitable for contribution to Atlassian. I expect the patch for this issue to be among them.

cheers,
Igor


Don Willis [Atlassian] added a comment - 05/Jun/08 09:15 PM
Hi Igor,

I'll be sure to check it out.
We are always thrilled to receive patches on issues.

Cheers,
Don