XMLWordPrintable

Details

    Description

      In the last several weeks we've been seeing a lot of confluence instabilities at wikis.sun.com - all of them were related to running out of heap space. Several iterations of increasing Xmx didn't help (we started at 3GB and now we are at 5GB and 64bit JVM).

      I took several memory dumps during outages and analyzed them with Eclipse Memory Analyzer, which repeatedly found two issues:

      • Something is storing Xerces SaxParser objects as ThreadLocal variables, this results in up to 90MB being retained per thread and I see several instances of this size being held in memory causing total of 800-1200MB of the memory to be retained
      • Hundreds of instances of net.sf.hibernate.impl.SessionImpl retain additional ~780MB of memory - I'll document this as a separate issue

      Just before taking the heap dump, I also took a thread dump. By comparing the two I found that threads that were holding on the the huge thread local variables were currently in the containers thread pool and were not processing any requests - thus should have minimal memory requirements.

      I'm attaching some annotated screenshots from Eclipse Memory Analyzer and a thread dump that proves that the misbehaving threads were idle.

      Attachments

        1. SAXParserInstanceListing.jpg
          151 kB
          Igor Minar
        2. ThreadInstanceDrilldown.png
          266 kB
          Igor Minar
        3. ThreadInstanceListing.jpg
          166 kB
          Igor Minar
        4. ThreadSuspectSummary.png
          121 kB
          Igor Minar
        5. wikis-threaddump-090320_1106.txt
          274 kB
          Igor Minar
        6. XMLReaderManager.class
          3 kB
          Andrew Lynch

        Activity

          People

            alynch Andrew Lynch (Inactive)
            15d9a6950818 Igor Minar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: