Issue Details (XML | Word | Printable)

Key: CONF-14989
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Andrew Lynch [Atlassian]
Reporter: Igor Minar
Votes: 1
Watchers: 8
Operations

Add/Edit UI Mockup to this issue
If you were logged in you would be able to see more operations.
Confluence

Possible net.sf.hibernate.impl.SessionImpl Memory Leaks

Created: 24/Mar/09 01:42 AM   Updated: 10/Jan/10 06:26 PM   Resolved: 10/Jan/10 06:26 PM
Return to search
Component/s: Database / Hibernate, Engine Room / Architecture, Maintainability & Stability, Production Servers
Affects Version/s: 2.10.2
Fix Version/s: 3.1.1

Time Tracking:
Not Specified

File Attachments: None
Image Attachments:

1. SessionImplIncommingReferences.png
(133 kB)

2. SessionImplInstanceListing.png
(192 kB)

3. SessionImplSuspectSummary.png
(81 kB)
Environment:

jdk6u12 (32bit and 64bit), mysql5, war, cluster - but there is only one node in the cluster

Issue Links:
Reference
 

Participants: Anatoli Kazatchkov [Atlassian], Andrew Lynch [Atlassian], Charles Miller [Atlassian], Igor Minar and Matt Ryall [Atlassian]
Since last comment: 9 weeks, 5 days ago
Internal Complexity: 5
Internal Value: 6
Labels: SunWikis qa-triage
Reviewers: Chris Kiehl [Atlassian]


 Description  « Hide

In the last several weeks we've been seeing a lot of confluence instabilities at wikis.sun.com - all of them were related to running out of heap space. Several iterations of increasing Xmx didn't help (we started at 3GB and now we are at 5GB and 64bit JVM).

I took several memory dumps during outages and analyzed them with Eclipse Memory Analyzer, which repeatedly found two issues:

  • SAXParser memory leaks
  • hundreds of instances of net.sf.hibernate.impl.SessionImpl retaining 750MB+ heap memory

At the same time I see that the dump contains total of 163 threads. Given that Confluence uses OpenSessionInView pattern, I'd expect to see less than 163 live session instances (since not all the threads are j2ee service threads).

I'm attaching some annotated screenshots from Eclipse Memory Analyzer.



Igor Minar made changes - 24/Mar/09 01:52 AM
Field Original Value New Value
Attachment SessionImplIncommingReferences.png [ 30959 ]
Attachment SessionImplInstanceListing.png [ 30958 ]
Igor Minar made changes - 24/Mar/09 01:52 AM
Attachment SessionImplSuspectSummary.png [ 30960 ]
Anatoli Kazatchkov [Atlassian] added a comment - 24/Mar/09 05:43 PM

Igor,

You mentioned that you only started noticing the problem in the last several weeks. Has anything been changed on the instance that might have triggered the problem? Any confluence upgrades, plugin upgrades, new plugin installed? The additional info might help us to narrow down the search for the bug.

Anatoli.


Igor Minar added a comment - 24/Mar/09 05:59 PM - edited

Hi Anatoli,

It's hard to tell why this started happening only recently. That's because our traffic grows quite rapidly (min 10% each month, a lot more during spikes).

There were two recent changes:

  • upgrade to confluence 2.10.2
  • expansion of the coherence caches

I know, I know, the cache must be culprit! That's what I thought for a few weeks and kept on increasing the heap size. Then when I started analyzing the heap dumps I found out that the cache consumes less then 1GB of heap and a lot more space is consumed by the SAXParser and SessionImpl objects.


Igor Minar added a comment - 24/Mar/09 06:08 PM

The first thing that I started to suspect when looking at the dumps was that a reference to the hibernate session instance was stored in the http session. That would explain why there were more of these instances in the heap than the actual number of concurrent connections that were being processed.

Our session expiration policy is set to 8 hours, which means that it can take a while for abandoned sessions to be gc-ed. This is not a problem if the session doesn't contain a lot of data, but could lead to problems similar to what we've been experiencing if there are bulky objects stored in the http session.

So I think I'd start by checking what's being stored in the http session.


Andrew Lynch [Atlassian] added a comment - 24/Mar/09 07:17 PM - edited

Hi Igor,

As far as I can tell from your heap dumps, the reason that the SessionImpls cannot be collected is because you have references to User objects (DefaultHibernateUser) stored within your HttpSession.
Seraph is placing these DefaultHibernateUser objects within the HttpSession, and unfortunately they contain references to SessionImpls (which can be grow quite large if a significant amount of work is performed on them, and unfortunately this is not cleared entirely when the Session is closed).

Regards,
Andrew Lynch


Andrew Lynch [Atlassian] made changes - 24/Mar/09 07:25 PM
Assignee Matt Ryall [Atlassian] [ matt@atlassian.com ]
Andrew Lynch [Atlassian] added a comment - 24/Mar/09 07:32 PM

The code in question (in DefaultAuthenticator) :

final Principal user = getUser(username);
...
request.getSession().setAttribute(LOGGED_IN_KEY, user);

We could fix this by :
1) Properly nulling out the maps in the SessionImpl when it is closed (this would also fix the same symptons in CONF-10575 and USER-228 ).
2) Not store this in the HttpSession (not sure if this is a real a option)
3) Only storing detached objects in the HttpSession.


Igor Minar added a comment - 24/Mar/09 09:25 PM

1) or 3) sound good.

With 3) there is still a risk that the detached object will hold on to a large object tree though.


Matt Ryall [Atlassian] added a comment - 21/Apr/09 01:16 AM

I'm in favour of both option #1 and option #3. That is, we should patch Hibernate to clean up its session when it is closed and we should not store persistent objects in the app server session.

To implement the latter, Seraph should allow the application to provide the object which is to be stored in the session. Confluence can provide a Principal which is a DefaultUser or something else not attached to the database session.


Matt Ryall [Atlassian] made changes - 13/May/09 01:17 AM
Assignee Matt Ryall [Atlassian] [ matt@atlassian.com ]
Igor Minar added a comment - 28/Sep/09 07:01 PM - edited

Hi guys,

Any plans to get this fixed soon? We ran out of memory twice in the last week and we already use 5.5GB heap and Conf 3.0.1. I'm just examining the heap dumps and was able to confirm that the problems were caused by this particular bug.

thanks


Igor Minar added a comment - 30/Sep/09 12:36 PM

Some more info from the heap dump:

Each user object (com.atlassian.user.impl.hibernate.DefaultHibernateUser) was present in the dump five times. Thanks to CONF-12319 and our 120+k users, that results in 600+k user objects on the heap

All these user objects were retained on the heap due to references from hibernate collections.

We have the user cache set to 130k, so due to CONF-12319, I'd expect all of the user objects to be on the heap, but only once. It seems that there are some big inefficiencies going on here.


Igor Minar added a comment - 30/Sep/09 05:21 PM

can you please add CONF-12319 as a related issue? thnx


Partha Kamal [Atlassian] made changes - 30/Sep/09 07:04 PM
Link This issue relates to CONF-12319 [ CONF-12319 ]
Andrew Lynch [Atlassian] added a comment - 01/Oct/09 02:36 AM

Hi Igor,

I will be investigating the feasibility of a fix of this issue for 3.1.
I don't believe CONF-12319 is really a related issue, although it may exacerbate this issue.

Regards,
Andrew Lynch


Igor Minar added a comment - 01/Oct/09 10:44 AM

Thanks Andrew. We'd really appreciate to have this resolved soon and stabilize our site.


Anatoli Kazatchkov [Atlassian] made changes - 15/Oct/09 12:42 AM
Internal Value 6
Internal Complexity 5
Anatoli Kazatchkov [Atlassian] made changes - 27/Oct/09 07:46 PM
Assignee Andrew Lynch [Atlassian] [ alynch ]
Igor Minar added a comment - 03/Dec/09 11:33 AM

Andrew, is the patch being included in 3.1?


Andrew Lynch [Atlassian] added a comment - 03/Dec/09 05:26 PM

Hi Igor,

It will be in 3.1.1.

Regards,
Andrew Lynch


Andrew Lynch [Atlassian] made changes - 03/Dec/09 05:26 PM
Fix Version/s 3.1.1 [ 14973 ]
Igor Minar added a comment - 03/Dec/09 06:20 PM

thanks


Andrew Lynch [Atlassian] made changes - 03/Jan/10 05:27 PM
Status Open [ 1 ] Technical Review [ 10028 ]
Chris Kiehl [Atlassian] made changes - 10/Jan/10 05:27 PM
Reviewers [ckiehl]
Status Technical Review [ 10028 ] Quality Review [ 10029 ]
Mark Hrynczak [Atlassian] made changes - 10/Jan/10 06:26 PM
Labels SunWikis SunWikis qa-triage
Mark Hrynczak [Atlassian] made changes - 10/Jan/10 06:26 PM
Resolution Fixed [ 1 ]
Status Quality Review [ 10029 ] Resolved [ 5 ]