-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Low
-
None
-
Affects Version/s: 9.2.15
-
Component/s: Server - Jobs
-
None
-
1
-
Severity 2 - Major
Issue Summary
When a page restriction (or other page-permission change) is added to a page that has a large number of descendant pages, Confluence runs a background job to update the search index for every descendant. On Confluence Data Center running on Microsoft SQL Server, this job can hold a long-running database transaction that escalates to a table-level lock on the CONTENT table. While the lock is held, most page reads and writes across the whole instance are blocked, which can lead to a site-wide outage until the job finishes or is manually cleared.
In environments where the descendant tree is large enough, the job may also fail and be automatically retried, causing the outage to recur on each retry.
Steps to Reproduce
- Use Confluence DC with Microsoft SQL Server.
- Choose a page that has a large descendant tree (in the reported incident this was approximately 370+ descendants; instances with smaller trees but a large number of attachments per page may also trigger the issue).
- Add a page restriction to that parent page.
- Monitor the CONTENT table on the database — within a few minutes the table is held under an exclusive lock by a long-running transaction originating from the Confluence background-job thread.
- Concurrent end-user activity that touches Confluence content (page views, edits, etc.) is blocked until the lock is released.
Expected Results
Background jobs execution does not cause table-level locking.
Actual Results
Table is locked until the problem query is released, so other part of the system who rely on this table may be blocked, application server's request thread poll may fill up and node may eventually need to be restarted to recover
Workaround
Manually kill the blocked query and remove its corresponding entry in BACKGROUND_JOB.