Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-31283

In Confluence + Java7, page save times out when a large number of users are watching a page

      Editing and saving (make sure "notify watchers is selected") this page on sdog.jira.com causes Confluence to timeout and an Electric Charlie is displayed.

      A large number of users are watching this particular page, which might be the reason for this bug.

            [CONFSERVER-31283] In Confluence + Java7, page save times out when a large number of users are watching a page

            slancashire: i believe it is 5.6.0 is the first time we committed to a BTF version of the notifications stuff (that's undarkened - the code's been around around since early 5.4 ~ 5.5, but just disabled)/

            Chii (Inactive) added a comment - slancashire : i believe it is 5.6.0 is the first time we committed to a BTF version of the notifications stuff (that's undarkened - the code's been around around since early 5.4 ~ 5.5, but just disabled)/

            jxie when did the notifications stuff remove rendering and sending of emails off the request thread, I'd like to put a fix version on this.

            Plugin event listeners doing things synchronously are unlikely to be causing differences in page save times between java 6 and 7, so we should be able to close this out.

            Steve Lancashire (Inactive) added a comment - jxie when did the notifications stuff remove rendering and sending of emails off the request thread, I'd like to put a fix version on this. Plugin event listeners doing things synchronously are unlikely to be causing differences in page save times between java 6 and 7, so we should be able to close this out.

            The notifications stuff did remove the rendering and sending of emails off the request thread. However, the problem may still exist if there are plugins that are doing work via listening to the PageCreated (or edited) events synchronously. i recall the jira issue macro used to perform the remote linking of jira issues synchronously (see com.atlassian.confluence.plugins.jira.ConfluenceEventListener#createJiraRemoteLinks(BlogPostCreateEvent)), but that might've been fixed up, i m not 100% sure.

            Chii (Inactive) added a comment - The notifications stuff did remove the rendering and sending of emails off the request thread. However, the problem may still exist if there are plugins that are doing work via listening to the PageCreated (or edited) events synchronously. i recall the jira issue macro used to perform the remote linking of jira issues synchronously (see com.atlassian.confluence.plugins.jira.ConfluenceEventListener#createJiraRemoteLinks(BlogPostCreateEvent) ), but that might've been fixed up, i m not 100% sure.

            kgbvax added a comment - - edited

            While investigating CONF-34360 with a profiler I can confirm that at least for that issue a vast amount of time is spent in com.atlassian.botocss.Botocss.inject. I see 3000 msec per invocation.
            The culprit seems to me that the CSS is parsed over and over again for each request.

            See

            The solution seems to be right there, from botocss docs:

            If you are processing a large number of HTML documents with the same stylesheet, Botocss provides a method to parse your stylesheet just once and reuse it:

                // example of reusing stylesheets
                BotocssStyles styles = Botocss.parse(css1, css2);
                for (String input : inputs) {
                    output.add(Botocss.inject(html, styles));
                }
            

            If you accept this, then botocss is not to blame but the usage of it is.
            Suggestion: Revisit SoyTofuFunctionAdapter and have it e.g. pass in the parsed BotoCssStyles.

            kgbvax added a comment - - edited While investigating CONF-34360 with a profiler I can confirm that at least for that issue a vast amount of time is spent in com.atlassian.botocss.Botocss.inject. I see 3000 msec per invocation. The culprit seems to me that the CSS is parsed over and over again for each request. See The solution seems to be right there, from botocss docs: If you are processing a large number of HTML documents with the same stylesheet, Botocss provides a method to parse your stylesheet just once and reuse it: // example of reusing stylesheets BotocssStyles styles = Botocss.parse(css1, css2); for ( String input : inputs) { output.add(Botocss.inject(html, styles)); } If you accept this, then botocss is not to blame but the usage of it is. Suggestion: Revisit SoyTofuFunctionAdapter and have it e.g. pass in the parsed BotoCssStyles.

            slancashire, thanks a lot for pointing this out. Looks like a very likely culprit.

            Kordinator added a comment - slancashire , thanks a lot for pointing this out. Looks like a very likely culprit.

            akord is this possibly contributing to the daily EAC performance problems where we see a mail queue flush time spike and a spike in garbage collection. Seems we've removed the static "cache" from botocss, we now see massive garbage collection and slowness when emails spike. Might be worth rolling back https://bitbucket.org/mryall/botocss/commits/465f60a20194ab7506fbafa12ff949f32e80444d to see if performance improves?

            Steve Lancashire (Inactive) added a comment - akord is this possibly contributing to the daily EAC performance problems where we see a mail queue flush time spike and a spike in garbage collection. Seems we've removed the static "cache" from botocss, we now see massive garbage collection and slowness when emails spike. Might be worth rolling back https://bitbucket.org/mryall/botocss/commits/465f60a20194ab7506fbafa12ff949f32e80444d to see if performance improves?

            In terms of "fixing" the underlying memory usage of the inject-CSS-in-mail process, we could do a few things:

            • the lexer used by the CSS parser has a 9 MB state cache (not a leak) which could probably be reduced
            • we could truncate large email content before injecting the styles, limiting the memory and processing overhead associated with sending notifications for large pages
            • we could simplify or reduce the number of styles we apply to emails (particularly those that apply to common elements like TD, LI and P)
            • if we can find or write a library which can apply CSS styles to an HTML document without having that document in memory (i.e. in a streaming fashion), that would reduce memory usage significantly. I'm not sure if this is actually possible, however.

            Some of these might be easy, but most of them will be tricky. So I think we'd be better to look into removing the email queue first, and just sending emails as they're generated. Confluence doesn't have a proper persistent disk-backed queue and the MTAs that are receiving the emails do. This will solve a number of email-related problems with a single improvement.

            Matt Ryall added a comment - In terms of "fixing" the underlying memory usage of the inject-CSS-in-mail process, we could do a few things: the lexer used by the CSS parser has a 9 MB state cache (not a leak) which could probably be reduced we could truncate large email content before injecting the styles, limiting the memory and processing overhead associated with sending notifications for large pages we could simplify or reduce the number of styles we apply to emails (particularly those that apply to common elements like TD, LI and P) if we can find or write a library which can apply CSS styles to an HTML document without having that document in memory (i.e. in a streaming fashion), that would reduce memory usage significantly. I'm not sure if this is actually possible, however. Some of these might be easy, but most of them will be tricky. So I think we'd be better to look into removing the email queue first, and just sending emails as they're generated. Confluence doesn't have a proper persistent disk-backed queue and the MTAs that are receiving the emails do. This will solve a number of email-related problems with a single improvement.

            https://jira.atlassian.com/browse/CONFDEV-15413 was a different issue. As you said, I fixed speed, but not memory consumption.

            Can't we also add "fix" botocss to the list of things to do? Last time I checked, there was some "leak" on the lexer/parser that maybe can be fixed updating to ANTLR 4, although I haven't checked in full.

            Angel Eduardo Garcia Hernandez (Inactive) added a comment - https://jira.atlassian.com/browse/CONFDEV-15413 was a different issue. As you said, I fixed speed, but not memory consumption. Can't we also add "fix" botocss to the list of things to do? Last time I checked, there was some "leak" on the lexer/parser that maybe can be fixed updating to ANTLR 4, although I haven't checked in full.

            Matt Ryall added a comment -

            Yeah, I had a feeling this problem was the same problem that used to caused occasional OOMEs on Pug: it's the botocss inline CSS transformation bloating the HTML in our emails, then queuing up a large number of emails (in memory) for sending.

            I actually raised CONFDEV-15413 to get it fixed before, but egarcia fixed a different performance problem which made it faster without addressing the memory consumption issues.

            If this is actually the problem, we discussed some possible fixes in the past:

            • don't queue emails (as Charles proposed)
            • queue emails on disk instead of in memory (should be configuration with ehcache/Coherence)
            • queue lightweight notification objects (content ID + version + user ID) instead of complete emails.

            Matt Ryall added a comment - Yeah, I had a feeling this problem was the same problem that used to caused occasional OOMEs on Pug: it's the botocss inline CSS transformation bloating the HTML in our emails, then queuing up a large number of emails (in memory) for sending. I actually raised CONFDEV-15413 to get it fixed before, but egarcia fixed a different performance problem which made it faster without addressing the memory consumption issues. If this is actually the problem, we discussed some possible fixes in the past: don't queue emails (as Charles proposed) queue emails on disk instead of in memory (should be configuration with ehcache/Coherence) queue lightweight notification objects (content ID + version + user ID) instead of complete emails.

            CharlesA added a comment - - edited

            Bollocks.

            I have an Emergency Windows Machine at home. I guess I can find a 32 bit VM for it.

            CharlesA added a comment - - edited Bollocks. I have an Emergency Windows Machine at home. I guess I can find a 32 bit VM for it.

              jxie Chii (Inactive)
              cchauvet Colin Chauvet (Inactive)
              Affected customers:
              13 This affects my team
              Watchers:
              29 Start watching this issue

                Created:
                Updated:
                Resolved: