[CONFSERVER-31283] In Confluence + Java7, page save times out when a large number of users are watching a page

Type: Bug
Resolution: Fixed
Priority: High
Fix Version/s: 5.5-OD-24, 5.6
Affects Version/s: 5.0, 5.5
Component/s: None
Labels:

Bug Fix Policy:
View Atlassian Server bug fix policy

Editing and saving (make sure "notify watchers is selected") this page on sdog.jira.com causes Confluence to timeout and an Electric Charlie is displayed.

A large number of users are watching this particular page, which might be the reason for this bug.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

Bildschirmfoto 2014-07-23 um 00.31.13.png
227 kB
22/Jul/2014 10:32 PM
MultipleInvocationBotocss.parse.png
227 kB
22/Jul/2014 10:35 PM

is duplicated by

CONFSERVER-32897 Pages with hundreds of watchers can cause performance churn

Closed

CONFSERVER-21846 saving a page takes too long

Closed

CONFSERVER-34360 Sending Email Notifications After Commenting on a Page Slow Down Confluence

Closed

relates to

CONFSERVER-21846 saving a page takes too long

Closed

CONFSERVER-48578 Event creation is very slow when calendar watched when a large number of users are watching a page

Closed

mentioned in: Wiki Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(9 mentioned in)

Chii (Inactive) added a comment - 11/Dec/2014 5:13 AM

slancashire: i believe it is 5.6.0 is the first time we committed to a BTF version of the notifications stuff (that's undarkened - the code's been around around since early 5.4 ~ 5.5, but just disabled)/

Chii (Inactive) added a comment - 11/Dec/2014 5:13 AM slancashire : i believe it is 5.6.0 is the first time we committed to a BTF version of the notifications stuff (that's undarkened - the code's been around around since early 5.4 ~ 5.5, but just disabled)/

Steve Lancashire (Inactive) added a comment - 10/Dec/2014 6:42 AM

jxie when did the notifications stuff remove rendering and sending of emails off the request thread, I'd like to put a fix version on this.

Plugin event listeners doing things synchronously are unlikely to be causing differences in page save times between java 6 and 7, so we should be able to close this out.

Steve Lancashire (Inactive) added a comment - 10/Dec/2014 6:42 AM jxie when did the notifications stuff remove rendering and sending of emails off the request thread, I'd like to put a fix version on this. Plugin event listeners doing things synchronously are unlikely to be causing differences in page save times between java 6 and 7, so we should be able to close this out.

Chii (Inactive) added a comment - 26/Nov/2014 12:49 AM

The notifications stuff did remove the rendering and sending of emails off the request thread. However, the problem may still exist if there are plugins that are doing work via listening to the PageCreated (or edited) events synchronously. i recall the jira issue macro used to perform the remote linking of jira issues synchronously (see com.atlassian.confluence.plugins.jira.ConfluenceEventListener#createJiraRemoteLinks(BlogPostCreateEvent)), but that might've been fixed up, i m not 100% sure.

Chii (Inactive) added a comment - 26/Nov/2014 12:49 AM The notifications stuff did remove the rendering and sending of emails off the request thread. However, the problem may still exist if there are plugins that are doing work via listening to the PageCreated (or edited) events synchronously. i recall the jira issue macro used to perform the remote linking of jira issues synchronously (see com.atlassian.confluence.plugins.jira.ConfluenceEventListener#createJiraRemoteLinks(BlogPostCreateEvent) ), but that might've been fixed up, i m not 100% sure.

kgbvax added a comment - 22/Jul/2014 10:31 PM - edited

While investigating ~~CONF-34360~~ with a profiler I can confirm that at least for that issue a vast amount of time is spent in com.atlassian.botocss.Botocss.inject. I see 3000 msec per invocation.
The culprit seems to me that the CSS is parsed over and over again for each request.

See

The solution seems to be right there, from botocss docs:

If you are processing a large number of HTML documents with the same stylesheet, Botocss provides a method to parse your stylesheet just once and reuse it:

    // example of reusing stylesheets
    BotocssStyles styles = Botocss.parse(css1, css2);
    for (String input : inputs) {
        output.add(Botocss.inject(html, styles));
    }

If you accept this, then botocss is not to blame but the usage of it is.
Suggestion: Revisit SoyTofuFunctionAdapter and have it e.g. pass in the parsed BotoCssStyles.

kgbvax added a comment - 22/Jul/2014 10:31 PM - edited While investigating CONF-34360 with a profiler I can confirm that at least for that issue a vast amount of time is spent in com.atlassian.botocss.Botocss.inject. I see 3000 msec per invocation. The culprit seems to me that the CSS is parsed over and over again for each request. See The solution seems to be right there, from botocss docs: If you are processing a large number of HTML documents with the same stylesheet, Botocss provides a method to parse your stylesheet just once and reuse it: // example of reusing stylesheets BotocssStyles styles = Botocss.parse(css1, css2); for ( String input : inputs) { output.add(Botocss.inject(html, styles)); } If you accept this, then botocss is not to blame but the usage of it is. Suggestion: Revisit SoyTofuFunctionAdapter and have it e.g. pass in the parsed BotoCssStyles.

Kordinator added a comment - 23/Feb/2014 10:45 PM

slancashire, thanks a lot for pointing this out. Looks like a very likely culprit.

Kordinator added a comment - 23/Feb/2014 10:45 PM slancashire , thanks a lot for pointing this out. Looks like a very likely culprit.

Steve Lancashire (Inactive) added a comment - 20/Feb/2014 10:56 AM

akord is this possibly contributing to the daily EAC performance problems where we see a mail queue flush time spike and a spike in garbage collection. Seems we've removed the static "cache" from botocss, we now see massive garbage collection and slowness when emails spike. Might be worth rolling back https://bitbucket.org/mryall/botocss/commits/465f60a20194ab7506fbafa12ff949f32e80444d to see if performance improves?

Steve Lancashire (Inactive) added a comment - 20/Feb/2014 10:56 AM akord is this possibly contributing to the daily EAC performance problems where we see a mail queue flush time spike and a spike in garbage collection. Seems we've removed the static "cache" from botocss, we now see massive garbage collection and slowness when emails spike. Might be worth rolling back https://bitbucket.org/mryall/botocss/commits/465f60a20194ab7506fbafa12ff949f32e80444d to see if performance improves?

Matt Ryall added a comment - 27/Sep/2013 12:17 AM

In terms of "fixing" the underlying memory usage of the inject-CSS-in-mail process, we could do a few things:

the lexer used by the CSS parser has a 9 MB state cache (not a leak) which could probably be reduced
we could truncate large email content before injecting the styles, limiting the memory and processing overhead associated with sending notifications for large pages
we could simplify or reduce the number of styles we apply to emails (particularly those that apply to common elements like TD, LI and P)
if we can find or write a library which can apply CSS styles to an HTML document without having that document in memory (i.e. in a streaming fashion), that would reduce memory usage significantly. I'm not sure if this is actually possible, however.

Some of these might be easy, but most of them will be tricky. So I think we'd be better to look into removing the email queue first, and just sending emails as they're generated. Confluence doesn't have a proper persistent disk-backed queue and the MTAs that are receiving the emails do. This will solve a number of email-related problems with a single improvement.

Matt Ryall added a comment - 27/Sep/2013 12:17 AM In terms of "fixing" the underlying memory usage of the inject-CSS-in-mail process, we could do a few things: the lexer used by the CSS parser has a 9 MB state cache (not a leak) which could probably be reduced we could truncate large email content before injecting the styles, limiting the memory and processing overhead associated with sending notifications for large pages we could simplify or reduce the number of styles we apply to emails (particularly those that apply to common elements like TD, LI and P) if we can find or write a library which can apply CSS styles to an HTML document without having that document in memory (i.e. in a streaming fashion), that would reduce memory usage significantly. I'm not sure if this is actually possible, however. Some of these might be easy, but most of them will be tricky. So I think we'd be better to look into removing the email queue first, and just sending emails as they're generated. Confluence doesn't have a proper persistent disk-backed queue and the MTAs that are receiving the emails do. This will solve a number of email-related problems with a single improvement.

Angel Eduardo Garcia Hernandez (Inactive) added a comment - 26/Sep/2013 10:20 PM

https://jira.atlassian.com/browse/CONFDEV-15413 was a different issue. As you said, I fixed speed, but not memory consumption.

Can't we also add "fix" botocss to the list of things to do? Last time I checked, there was some "leak" on the lexer/parser that maybe can be fixed updating to ANTLR 4, although I haven't checked in full.

Angel Eduardo Garcia Hernandez (Inactive) added a comment - 26/Sep/2013 10:20 PM https://jira.atlassian.com/browse/CONFDEV-15413 was a different issue. As you said, I fixed speed, but not memory consumption. Can't we also add "fix" botocss to the list of things to do? Last time I checked, there was some "leak" on the lexer/parser that maybe can be fixed updating to ANTLR 4, although I haven't checked in full.

Matt Ryall added a comment - 26/Sep/2013 7:01 AM

Yeah, I had a feeling this problem was the same problem that used to caused occasional OOMEs on Pug: it's the botocss inline CSS transformation bloating the HTML in our emails, then queuing up a large number of emails (in memory) for sending.

I actually raised CONFDEV-15413 to get it fixed before, but egarcia fixed a different performance problem which made it faster without addressing the memory consumption issues.

If this is actually the problem, we discussed some possible fixes in the past:

don't queue emails (as Charles proposed)
queue emails on disk instead of in memory (should be configuration with ehcache/Coherence)
queue lightweight notification objects (content ID + version + user ID) instead of complete emails.

Matt Ryall added a comment - 26/Sep/2013 7:01 AM Yeah, I had a feeling this problem was the same problem that used to caused occasional OOMEs on Pug: it's the botocss inline CSS transformation bloating the HTML in our emails, then queuing up a large number of emails (in memory) for sending. I actually raised CONFDEV-15413 to get it fixed before, but egarcia fixed a different performance problem which made it faster without addressing the memory consumption issues. If this is actually the problem, we discussed some possible fixes in the past: don't queue emails (as Charles proposed) queue emails on disk instead of in memory (should be configuration with ehcache/Coherence) queue lightweight notification objects (content ID + version + user ID) instead of complete emails.

CharlesA added a comment - 26/Sep/2013 6:18 AM - edited

Bollocks.

I have an Emergency Windows Machine at home. I guess I can find a 32 bit VM for it.

CharlesA added a comment - 26/Sep/2013 6:18 AM - edited Bollocks. I have an Emergency Windows Machine at home. I guess I can find a 32 bit VM for it.

Assignee:: Chii (Inactive)

Reporter:: Colin Chauvet (Inactive)

Affected customers:: 13 This affects my team

Watchers:: 29 Start watching this issue

Created:: 09/Aug/2013 12:41 AM

Updated:: 20/Apr/2022 8:11 AM

Resolved:: 11/Dec/2014 6:36 AM

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

Collapse comment: Chii (Inactive) added a comment - 11/Dec/2014 5:13 AM

Expand comment: Chii (Inactive) added a comment - 11/Dec/2014 5:13 AM

Collapse comment: Steve Lancashire (Inactive) added a comment - 10/Dec/2014 6:42 AM

Expand comment: Steve Lancashire (Inactive) added a comment - 10/Dec/2014 6:42 AM

Collapse comment: Chii (Inactive) added a comment - 26/Nov/2014 12:49 AM

Expand comment: Chii (Inactive) added a comment - 26/Nov/2014 12:49 AM

Collapse comment: kgbvax added a comment - 22/Jul/2014 10:31 PM, Edited by Petch - 08/Sep/2014 12:23 AM

Expand comment: kgbvax added a comment - 22/Jul/2014 10:31 PM, Edited by Petch - 08/Sep/2014 12:23 AM

Collapse comment: Kordinator added a comment - 23/Feb/2014 10:45 PM

Expand comment: Kordinator added a comment - 23/Feb/2014 10:45 PM

Collapse comment: Steve Lancashire (Inactive) added a comment - 20/Feb/2014 10:56 AM

Expand comment: Steve Lancashire (Inactive) added a comment - 20/Feb/2014 10:56 AM

Collapse comment: Matt Ryall added a comment - 27/Sep/2013 12:17 AM

Expand comment: Matt Ryall added a comment - 27/Sep/2013 12:17 AM

Collapse comment: Angel Eduardo Garcia Hernandez (Inactive) added a comment - 26/Sep/2013 10:20 PM

Expand comment: Angel Eduardo Garcia Hernandez (Inactive) added a comment - 26/Sep/2013 10:20 PM

Collapse comment: Matt Ryall added a comment - 26/Sep/2013 7:01 AM

Expand comment: Matt Ryall added a comment - 26/Sep/2013 7:01 AM

Collapse comment: CharlesA added a comment - 26/Sep/2013 6:18 AM, Edited by CharlesA - 26/Sep/2013 6:32 AM

Expand comment: CharlesA added a comment - 26/Sep/2013 6:18 AM, Edited by CharlesA - 26/Sep/2013 6:32 AM

People

Dates