• Icon: Support Request Support Request
    • Resolution: Support Request
    • Icon: High High (View bug fix roadmap)
    • None
    • 5.2.4
    • None

      We have a customer with a large JIRA (1M issues) and every two weeks they have to restart JIRA because it is running out of file descriptors. We increased the system setting from 30K to 300K descriptors but it still happens eventually. We're using SSD for the JIRA home directory

      I've done some investigation and found that on the production JIRA using lsof and the counts on the Lucene issue index files increase every 5 minutes or so. On staging it's only once or twice per day. The rate is lower at weekends, so it is human-related. What happens is that the same 60 different index files have new file descriptors added, but the old file descriptors are not closed. There is also an error that appears at the same time in the JIRA log file:

      "Tried to reopen the IndexReader, but it threw AlreadyClosedException. Opening a fresh IndexReader."

      This comes from DefaultIndexEngine and is where a new Lucene IndexReader is created if the previous one is believed closed. I suspect that the previous IndexReader may not have been really closed.

      There is a comment right in that place in the source code. It's the "don't worry" that makes me think there is still a problem around here!

      // JRADEV-7825: Really this shouldn't happen unless someone closes the reader from outside all
      // the inscrutable code in this class (and its friends) but
      // don't worry, we will just open a new one in that case.

      This is similar symptoms but apparently a different cause than https://confluence.atlassian.com/display/JIRAKB/Loss+of+Functionality+due+to+Too+Many+Open+Files+Error

            [JRASERVER-38039] JIRA IndexReader is leaking file descriptors

            crf added a comment -

            andrew.morin1:

            Leaking Lucene indexes can be caused by any number of things. While there have been some versions of JIRA with bugs that triggered this, it is almost always caused by a plugin that is doing something wrong, such as running searches from its own thread pool without using JiraThreadLocalUtil to take care of the cleanup.

            Given that there are a wide variety of potential culprits and that it is usually related to a plugin that we do not ourselves test, there is no simple resolution for this issue. Customers that encounter it should open a support request so that we can work with you to track down what is causing it on your particular instance.

            crf added a comment - andrew.morin1 : Leaking Lucene indexes can be caused by any number of things. While there have been some versions of JIRA with bugs that triggered this, it is almost always caused by a plugin that is doing something wrong, such as running searches from its own thread pool without using JiraThreadLocalUtil to take care of the cleanup. Given that there are a wide variety of potential culprits and that it is usually related to a plugin that we do not ourselves test, there is no simple resolution for this issue. Customers that encounter it should open a support request so that we can work with you to track down what is causing it on your particular instance.

            What was the resolution for this?

            Andrew Morin added a comment - What was the resolution for this?

            Hi jasbris,

            Could you please file a support case at https://support.atlassian.com? that way we can take a look and troubleshoot what is happening in your instance.

            Regards,

            Oswaldo Hernández.
            JIRA Bugmaster.
            [Atlassian].

            Oswaldo Hernandez (Inactive) added a comment - Hi jasbris , Could you please file a support case at https://support.atlassian.com? that way we can take a look and troubleshoot what is happening in your instance. Regards, Oswaldo Hernández. JIRA Bugmaster. [Atlassian] .

            clepetit / matt.doar,

            Were you able to resolve this as a support case? We see this both on JIRA 5.2.8 with ~600,000+ issues, and on 6.2.5 with ~500,000+ issues. We often see in the logs:

            Tried to reopen the IndexReader, but it threw AlreadyClosedException. Opening a fresh IndexReader.
            

            We also have to reboot our 5.2.8 instance every 3 weeks. We use Fusion I/O SSD cards for our storage on the 5.2.8 instance. This is the instance we really see the open files issue with, but both servers see the error message.

            Jason Brison added a comment - clepetit / matt.doar , Were you able to resolve this as a support case? We see this both on JIRA 5.2.8 with ~600,000+ issues, and on 6.2.5 with ~500,000+ issues. We often see in the logs: Tried to reopen the IndexReader, but it threw AlreadyClosedException. Opening a fresh IndexReader. We also have to reboot our 5.2.8 instance every 3 weeks. We use Fusion I/O SSD cards for our storage on the 5.2.8 instance. This is the instance we really see the open files issue with, but both servers see the error message.

            MattS added a comment -

            If you run lsof and see your total number of file descriptors increasing day over day, then check the names of the files from lsof. If they're all in the caches/issues directory and you see AlreadyClosedException in your logs files, then this would seem to match

            MattS added a comment - If you run lsof and see your total number of file descriptors increasing day over day, then check the names of the files from lsof. If they're all in the caches/issues directory and you see AlreadyClosedException in your logs files, then this would seem to match

            I can't be sure we're seeing the same thing but would appreciate an update on how the investigation is progressing.

            Michael Villis added a comment - I can't be sure we're seeing the same thing but would appreciate an update on how the investigation is progressing.

            When a solution is found for this can you please post it here as well as in the support ticket as we are experiencing this problem also.

            Cyril Egan added a comment - When a solution is found for this can you please post it here as well as in the support ticket as we are experiencing this problem also.

            MattS added a comment -

            Chris, that's fine, thanks. I'd like to have this issue remain public (but Closed) so that others can find it for future reference if possible.

            MattS added a comment - Chris, that's fine, thanks. I'd like to have this issue remain public (but Closed) so that others can find it for future reference if possible.

            ChrisA added a comment -

            matt.doar, I'd like to move this over to Support if possible. It looks right now like it is something that the developers haven't been able to reproduce with the time and resources available to them (and since technically, we don't offer SLA's etc on jira.atlassian.com (JAC), whilst do we on support.atlassian.com (SAC), it will help us get to the bottom of this). Our Support team on the other hand have the time to work on setting up an environment and replicating it and identifying where the issue is before we finalise a JAC issue, and we're better setup there to ensure it gets resolved, rather than left to languish.

            Would it be okay if we moved the investigation into this bug into a Support ticket?

            Regards,

            Chris Le Petit
            Service Enablement

            ChrisA added a comment - matt.doar , I'd like to move this over to Support if possible. It looks right now like it is something that the developers haven't been able to reproduce with the time and resources available to them (and since technically, we don't offer SLA's etc on jira.atlassian.com (JAC), whilst do we on support.atlassian.com (SAC), it will help us get to the bottom of this). Our Support team on the other hand have the time to work on setting up an environment and replicating it and identifying where the issue is before we finalise a JAC issue, and we're better setup there to ensure it gets resolved, rather than left to languish. Would it be okay if we moved the investigation into this bug into a Support ticket? Regards, Chris Le Petit Service Enablement

            MattS added a comment -

            I'm still checking but this is the total list;

            These plugins may be configured, enabled, disabled or uninstalled.

            Copy Date *
            CustomWare JIRA Utilities
            Enterprise Mail Handler for JIRA (JEMH) (disabled)
            JIRA Agile
            JIRA Auditor (disabled)
            JIRA Calendar Plugin
            JIRA Charting Plugin
            JIRA Misc Workflow Extensions
            JIRA Project Role Tab
            JIRA Subversion plugin
            JIRA Suite Utilities
            JIRA Toolkit Plugin
            Local Helpdesk *
            Script Runner
            Show Saved Filter With Columns
            Template Helper *
            valiantys-base-plugin
            VertygoSLA - Service Level Agreement

            The ones with an asterisk are locally developed. I didn't find any reference to IndexSearcher or IndexManager. One of those add-ons defines a new searcher for finding issues where the last commenter is not the reporter

            MattS added a comment - I'm still checking but this is the total list; These plugins may be configured, enabled, disabled or uninstalled. Copy Date * CustomWare JIRA Utilities Enterprise Mail Handler for JIRA (JEMH) (disabled) JIRA Agile JIRA Auditor (disabled) JIRA Calendar Plugin JIRA Charting Plugin JIRA Misc Workflow Extensions JIRA Project Role Tab JIRA Subversion plugin JIRA Suite Utilities JIRA Toolkit Plugin Local Helpdesk * Script Runner Show Saved Filter With Columns Template Helper * valiantys-base-plugin VertygoSLA - Service Level Agreement The ones with an asterisk are locally developed. I didn't find any reference to IndexSearcher or IndexManager. One of those add-ons defines a new searcher for finding issues where the last commenter is not the reporter

            crf added a comment -

            The JIRA Subversion plugin is something that I've identified as suspicious before because it is messing with Lucene directly. I would point you at a comment to that effect, except that it appears to be in support cases and restricted comments discussing that support case on public issues, so you wouldn't be able to see them.

            In any case, while I am pretty sure that the subversion plugin can leak indexes, I think the ones it would leak are its own, not the issue index, and it is usually the issue index that we see reported when this comes up.

            Can you identify any long-running threads that are owned by add-ons and might be grabbing searchers?

            crf added a comment - The JIRA Subversion plugin is something that I've identified as suspicious before because it is messing with Lucene directly. I would point you at a comment to that effect, except that it appears to be in support cases and restricted comments discussing that support case on public issues, so you wouldn't be able to see them. In any case, while I am pretty sure that the subversion plugin can leak indexes, I think the ones it would leak are its own, not the issue index, and it is usually the issue index that we see reported when this comes up. Can you identify any long-running threads that are owned by add-ons and might be grabbing searchers?

            MattS added a comment -

            The exception still occurs after disabling the JIRA Subversion add-on, at about the same frequency

            MattS added a comment - The exception still occurs after disabling the JIRA Subversion add-on, at about the same frequency

            MattS added a comment -

            Chris,

            Thanks for the info. There's no background index happening, but the JIRA Subversion add-on (0.10.11.1) is present and does do index-related things (and takes 2 hours to run). The AlreadyClosedException happens every few minutes in production, usually when someone tries to use JQL to run a search, which makes sense since that would try to get the IndexReader and reopen it, causing the given exception. However it doesn't happen at the weekend so I believe that whatever add-on is doing this is used by people indirectly.

            The src/main/java/com/atlassian/jira/plugin/ext/subversion/revisions/RevisionIndexer.java has a method hasDocument() where is creates a new IndexSearcher, then has a try/catch block and a finally to call close(). I guess it's possible that if the first statement threw an exception then close() wouldn't get called.

            The deleted file handles doesn't work because the close isn't getting called to delete them. It's not easy to debug this one, but I think the next thing I might try is disabling the JIRA Subversion add-on for 24 hours and see if the rate of AlreadyClosedExceptions slows down

            MattS added a comment - Chris, Thanks for the info. There's no background index happening, but the JIRA Subversion add-on (0.10.11.1) is present and does do index-related things (and takes 2 hours to run). The AlreadyClosedException happens every few minutes in production, usually when someone tries to use JQL to run a search, which makes sense since that would try to get the IndexReader and reopen it, causing the given exception. However it doesn't happen at the weekend so I believe that whatever add-on is doing this is used by people indirectly. The src/main/java/com/atlassian/jira/plugin/ext/subversion/revisions/RevisionIndexer.java has a method hasDocument() where is creates a new IndexSearcher, then has a try/catch block and a finally to call close(). I guess it's possible that if the first statement threw an exception then close() wouldn't get called. The deleted file handles doesn't work because the close isn't getting called to delete them. It's not easy to debug this one, but I think the next thing I might try is disabling the JIRA Subversion add-on for 24 hours and see if the rate of AlreadyClosedExceptions slows down

            crf added a comment -

            matt.doar:

            We're fairly certain at this point that normal JIRA operation does not leak the index file descriptors. The only suspect I can identify in JIRA itself at this point is a background reindex, and tcampbell is investigating that possibility.

            If that doesn't turn out to be the culprit, then it probably means that an add-on is responsible, and will unfortunately be difficult to track down. Usually these arise from an add-on that has created its own thread pool and runs searches directly, using IndexManager.getIndexSearcher() without calling close() on it. It may be possible to identify the possible suspects by looking at a thread dump and identifying the ones whose stacks originate in an add-on's code and whittling this down to the list of add-ons that are known to perform searches. A better way is to use lsof +L1 to get a list of deleted file handles, taking a heap dump from JIRA, and correlating the deleted file handles from lsof with the FileDescriptor objects in the heap dump and tracing them back to their GC roots.

            Can you confirm whether or not background reindexes are being performed on a regular basis and whether or not this is likely to be the cause here? If we can rule that out, then the lsof + heapdump route is the best advice I can offer.

            crf added a comment - matt.doar : We're fairly certain at this point that normal JIRA operation does not leak the index file descriptors. The only suspect I can identify in JIRA itself at this point is a background reindex, and tcampbell is investigating that possibility. If that doesn't turn out to be the culprit, then it probably means that an add-on is responsible, and will unfortunately be difficult to track down. Usually these arise from an add-on that has created its own thread pool and runs searches directly, using IndexManager.getIndexSearcher() without calling close() on it. It may be possible to identify the possible suspects by looking at a thread dump and identifying the ones whose stacks originate in an add-on's code and whittling this down to the list of add-ons that are known to perform searches. A better way is to use lsof +L1 to get a list of deleted file handles, taking a heap dump from JIRA, and correlating the deleted file handles from lsof with the FileDescriptor objects in the heap dump and tracing them back to their GC roots. Can you confirm whether or not background reindexes are being performed on a regular basis and whether or not this is likely to be the cause here? If we can rule that out, then the lsof + heapdump route is the best advice I can offer.

            MattS added a comment -

            I've updated the priority to Critical because this is taking down an Enterprise scale JIRA every two weeks and has been doing so for a while now

            MattS added a comment - I've updated the priority to Critical because this is taking down an Enterprise scale JIRA every two weeks and has been doing so for a while now

            MattS added a comment -

            Looking at the Lucene source for the abstract class IndexReader the reason that the JIRA oldReader.reopen call throws AlreadyClosed is because the refcount is <= 0. So now I need to find out what changes calls the decRef method in IndexReader

            MattS added a comment - Looking at the Lucene source for the abstract class IndexReader the reason that the JIRA oldReader.reopen call throws AlreadyClosed is because the refcount is <= 0. So now I need to find out what changes calls the decRef method in IndexReader

            MattS added a comment - - edited

            I notice in the javadoc for http://lucene.apache.org/core/3_2_0/api/all/org/apache/lucene/index/IndexReader.html#reopen(org.apache.lucene.index.IndexWriter, boolean) that "Once the writer is closed, any outstanding readers may continue to be used. However, if you attempt to reopen any of those readers, you'll hit an AlreadyClosedException.'

            Sounds like it could be that an index writer is believed closed, and then we try to use some or other IndexReader and find it closed. So we create a whole new IndexReader.

            The real question is if the IndexWriter was closed why didn't it close the associated file descriptors. And if it isn't really closed why did Lucene think it was?

            MattS added a comment - - edited I notice in the javadoc for http://lucene.apache.org/core/3_2_0/api/all/org/apache/lucene/index/IndexReader.html#reopen(org.apache.lucene.index.IndexWriter , boolean) that "Once the writer is closed, any outstanding readers may continue to be used. However, if you attempt to reopen any of those readers, you'll hit an AlreadyClosedException.' Sounds like it could be that an index writer is believed closed, and then we try to use some or other IndexReader and find it closed. So we create a whole new IndexReader. The real question is if the IndexWriter was closed why didn't it close the associated file descriptors. And if it isn't really closed why did Lucene think it was?

            MattS added a comment - - edited

            I recompiled DefaultIndexEngine on staging to add a stracktrace to the logging for this and have attached it to this issue. It shows that a JQL search via the REST API went to reopen the IndexReader and found that it was already closed. That's when this exception was thrown.

            MattS added a comment - - edited I recompiled DefaultIndexEngine on staging to add a stracktrace to the logging for this and have attached it to this issue. It shows that a JQL search via the REST API went to reopen the IndexReader and found that it was already closed. That's when this exception was thrown.

              Unassigned Unassigned
              73d805a2526b MattS
              Votes:
              1 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated:
                Resolved: