• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Medium Medium (View bug fix roadmap)
    • 5.1.6, 5.2
    • 5.0.7, 5.1, 5.1.2
    • None
    • WAR-Installations of 5.0.7 and 5.1.2 running on a linux server.

      When reindexing Jira does not close the handles on the old index-files properly.
      Using lsof +L1 we could identify 6.89GiB of old indexes still open by jira but allready unlinked on the hard-drive.

      We run reindexing every night at 3am triggered by a service-task. Right now this means we should better restart Jira every week or two so we don't need a bigger data partition and improve Jira performance.

      How to reproduce?
      I just now hit "Reindex" inside Jira and after it finished there were 17 new lines in the output of lsof +L1.

      Workaround
      1. Disable the index optimization job as per our How to Disable the Index Optimization Job in JIRA KB article.
        This job is not necessary from JIRA 5.0 onwards and was deprecated as of JIRA 5.1.6 (JRA-29487).
      2. Whenever a full reindex is performed on JIRA, restart it afterwards.

          Form Name

            [JRASERVER-29587] Jira does not close handles of old index properly

            Yes! It's fixed in 5.1.6, which went out yesterday!

            Eric Dalgliesh added a comment - Yes! It's fixed in 5.1.6, which went out yesterday!

            MattS added a comment -

            Was there an actual code fix for this in the end, or are we left with the workaround?

            MattS added a comment - Was there an actual code fix for this in the end, or are we left with the workaround?

            crf added a comment -

            As to whether they could do multiple reindexes before actually doing a restart, it depends entirely on how critically they are impacted by the leaked file descriptors. If it is running them out of disk space, then I would think they would want to restart every time the reindex until this is fixed. If they have plenty of breathing room on the disk that holds the indexes, then they may not want to do so. But keep in mind that every time they reindex will leak space from the filesystem that is equivalent to the entire index, so the space can disappear very quickly.

            It's very little consolation, but the leak is predictable and linear (1G → 2G → 3G → 4G → 5G), not unpredictable or exponential (1G → 2G → 4G → 8G → 16G), so anyone with access to the filesystem should be able to estimate the risk and act accordingly.

            crf added a comment - As to whether they could do multiple reindexes before actually doing a restart, it depends entirely on how critically they are impacted by the leaked file descriptors. If it is running them out of disk space, then I would think they would want to restart every time the reindex until this is fixed. If they have plenty of breathing room on the disk that holds the indexes, then they may not want to do so. But keep in mind that every time they reindex will leak space from the filesystem that is equivalent to the entire index, so the space can disappear very quickly. It's very little consolation, but the leak is predictable and linear (1G → 2G → 3G → 4G → 5G), not unpredictable or exponential (1G → 2G → 4G → 8G → 16G), so anyone with access to the filesystem should be able to estimate the risk and act accordingly.

            crf added a comment -

            edalgliesh: This is unlikely to have a direct impact on performance. It is a resource leak (in this case, the filesystem), so the performance problems would come from free block searching at the filesystem level when it comes close to running out of space. More immediate concerns would be running out of file descriptors (a resource that the OS tends to limit) or running out of disk space entirely, either of which would be catastrophic (in that it would very likely corrupt the index).

            I suggested restarting after a reindex only as an emergency workaround and under the assumption that a full reindex is not something that is done very frequently. That assumption may or may not be correct, but given that the reindex itself causes an outage and cycling JIRA will take a short time by comparison on large instances, I would think adding a restart would be lost in the noise for customers that are seeing this and need a workaround ASAP.

            Really, disabling the OptimizeIndexJob in the scheduler-config.xml is more pressing, because that is done implicitly at midnight every single day, so it's easier for it to sneak up on you. With the reindex, at least you know it is happening.

            crf added a comment - edalgliesh : This is unlikely to have a direct impact on performance. It is a resource leak (in this case, the filesystem), so the performance problems would come from free block searching at the filesystem level when it comes close to running out of space. More immediate concerns would be running out of file descriptors (a resource that the OS tends to limit) or running out of disk space entirely, either of which would be catastrophic (in that it would very likely corrupt the index). I suggested restarting after a reindex only as an emergency workaround and under the assumption that a full reindex is not something that is done very frequently. That assumption may or may not be correct, but given that the reindex itself causes an outage and cycling JIRA will take a short time by comparison on large instances, I would think adding a restart would be lost in the noise for customers that are seeing this and need a workaround ASAP. Really, disabling the OptimizeIndexJob in the scheduler-config.xml is more pressing, because that is done implicitly at midnight every single day, so it's easier for it to sneak up on you. With the reindex, at least you know it is happening.

            cfuller: Does JIRA need to be restarted every time a reindex is called, or would it be OK to do it when JIRA starts slowing down? If that's the case, could an instance monitor how many reindexes it takes before it breaks and plan a restart at some number less than that?

            Eric Dalgliesh added a comment - cfuller : Does JIRA need to be restarted every time a reindex is called, or would it be OK to do it when JIRA starts slowing down? If that's the case, could an instance monitor how many reindexes it takes before it breaks and plan a restart at some number less than that?

            MattS added a comment -

            Confirmed I see this. At enterprise clients, I have to schedule a reindex of JIRA with the team, usually days in advance, because indexing makes JIRA unavailable. I think that requiring a restart after a reindex would be greeted with surprise by most of these clients, though in some view it is just an extended outage.

            MattS added a comment - Confirmed I see this. At enterprise clients, I have to schedule a reindex of JIRA with the team, usually days in advance, because indexing makes JIRA unavailable. I think that requiring a restart after a reindex would be greeted with surprise by most of these clients, though in some view it is just an extended outage.

            crf added a comment -

            Workarounds (both are required to completely eliminate the long-term effects of this bug):

            1. Disable the index optimization job that is currently hardcoded in the WEB-INF/classes/scheduler-config.xml file. This job is not necessary as of JIRA 5.0, and there are plans to remove it in JIRA 5.2.
            2. Whenever JIRA is reindexed, restart it afterwards

            crf added a comment - Workarounds (both are required to completely eliminate the long-term effects of this bug): Disable the index optimization job that is currently hardcoded in the WEB-INF/classes/scheduler-config.xml file. This job is not necessary as of JIRA 5.0, and there are plans to remove it in JIRA 5.2. Whenever JIRA is reindexed, restart it afterwards

            crf added a comment - - edited

            At support's request, I'm adding a bit more public information about what we have discovered so far and what the plan is to address it.

            We currently find two separate potential sources for leaked indexes.

            1. During startup, JIRA performs several consistency checks. While it is doing this, it calculates the number of comments in the Lucene index so it can report this information in the startup logs. This comment index searcher is never closed, and the result is that as time progresses and comment index segments are either merged or collected during normal alteration of the comment index, these original files are held open and their space can not be reclaimed.
            2. During the "optimize" operation, a new DefaultIndexEngine is created. Within it is a reference to an index reader that is kept around so that we can call IndexReader.reopen on it, potentially improving the performance when a write has been performed, but most of the index segments are unchanged. If the DefaultIndexEngine instance itself goes out of scope, it fails to close this reader, which prevents any disk space belonging to it from being reclaimed. This optimize operation is a normal part of the processing both during an explicit rebuilding of the indexes (such as when the Administrator requests that all issues be reindexed) and during the implicit optimize index operation that happens on every JIRA at midnight by default.

            On most installations, neither of these would be of much consequence. However, on very large installations, retaining disk space allocated to deleted files like this can have a significant impact by unnecessarily inflating the disk space that is allocated to Lucene indexes well beyond the amount that is actually being used for index storage. The fix plan is as follows:

            Short term solutions for these problems:

            1. Explicitly close the comment searcher acquired during the startup checks.
            2. Make sure the reader is properly closed whenever DefaultIndexEngine is closed.

            Long term:

            1. There is currently nothing guarding against similar inattention to searcher closing during startup checks in the future. This is relatively easy to catch near the end of the JIRA bootstrap, and we should do so.
            2. The second problem is due to one-off code in the index optimization code that would not normally affect a background service. It has explicitly allocated a new reader and failed to release it. Other than making sure DefaultIndexEngine will not do this, there isn't a good generic solution to this class of problem.

            The fix for these problems is understood, but there is insufficient time in the test schedule to get it into 5.1.5. Customers should expect the fix to be included in 5.1.6 (and 5.2).

            crf added a comment - - edited At support's request, I'm adding a bit more public information about what we have discovered so far and what the plan is to address it. We currently find two separate potential sources for leaked indexes. During startup, JIRA performs several consistency checks. While it is doing this, it calculates the number of comments in the Lucene index so it can report this information in the startup logs. This comment index searcher is never closed, and the result is that as time progresses and comment index segments are either merged or collected during normal alteration of the comment index, these original files are held open and their space can not be reclaimed. During the "optimize" operation, a new DefaultIndexEngine is created. Within it is a reference to an index reader that is kept around so that we can call IndexReader.reopen on it, potentially improving the performance when a write has been performed, but most of the index segments are unchanged. If the DefaultIndexEngine instance itself goes out of scope, it fails to close this reader, which prevents any disk space belonging to it from being reclaimed. This optimize operation is a normal part of the processing both during an explicit rebuilding of the indexes (such as when the Administrator requests that all issues be reindexed) and during the implicit optimize index operation that happens on every JIRA at midnight by default. On most installations, neither of these would be of much consequence. However, on very large installations, retaining disk space allocated to deleted files like this can have a significant impact by unnecessarily inflating the disk space that is allocated to Lucene indexes well beyond the amount that is actually being used for index storage. The fix plan is as follows: Short term solutions for these problems: Explicitly close the comment searcher acquired during the startup checks. Make sure the reader is properly closed whenever DefaultIndexEngine is closed. Long term: There is currently nothing guarding against similar inattention to searcher closing during startup checks in the future. This is relatively easy to catch near the end of the JIRA bootstrap, and we should do so. The second problem is due to one-off code in the index optimization code that would not normally affect a background service. It has explicitly allocated a new reader and failed to release it. Other than making sure DefaultIndexEngine will not do this, there isn't a good generic solution to this class of problem. The fix for these problems is understood, but there is insufficient time in the test schedule to get it into 5.1.5. Customers should expect the fix to be included in 5.1.6 (and 5.2).

            I could imagine, that com.atlassian.jira.index.DelayCloseable.Helper#checkClosed of the jira-core-module could be a bug related to this.

            buch.de internetstores AG added a comment - I could imagine, that com.atlassian.jira.index.DelayCloseable.Helper#checkClosed of the jira-core -module could be a bug related to this.

              edalgliesh Eric Dalgliesh
              e20c2a5ea0ff buch.de internetstores AG
              Affected customers:
              2 This affects my team
              Watchers:
              16 Start watching this issue

                Created:
                Updated:
                Resolved: