[CONFSERVER-90918] The retention job fails to delete all items in one batch if one of the items in the batch cannot be deleted due to some error

Type: Bug
Resolution: Fixed
Priority: Highest
Fix Version/s: 9.1.0
Affects Version/s: 7.19.5, 8.4.0, 7.19.6, 8.5.9, 8.5.19
Component/s: Server - Administration
Labels:
- whl-friction
- whl-fy24q3

Support reference count:
35
Symptom Severity:
Severity 2 - Major
UIS:
230
Bug Fix Policy:
View Atlassian Server bug fix policy

When applying retention rules if there are any data integrity issues (for example; page(s) that cannot be purged individually from Confluence UI due to some reasons (db constraints, etc.)) the retention job also fails to delete this particular page(s). The issue is the whole batch (default: 100 items in one batch) that contains this particular page cannot be deleted because of this one record.

If this problematic page is fetched for the batch every time the offset moves forward, the retention job got stuck and cannot move forward with deleting further pages.

Issue Summary

This is reproducible on Data Center: yes

Steps to Reproduce

Spin up a Confluence instance
Disable the 'Trash Removal (Soft)' job by navigating to > General Configuration > Scheduled Jobs
Import this Space backup: trashdata-Confluence-space-export-003654-2.xml.zip
There are 3 pages that have data integrity issues: asd-2500, asd-6500, asd-18500. Their data integrity issues were created manually and intentionally by manipulating data on the database side.
Add the below class to > General Configuration > Logging and profiling as DEBUG:
```
com.atlassian.confluence.impl.retention
```
Navigate to the 'Trash' space (we imported it on the second step) and from there navigate to > Space tools > Content Tools > Retention rules
- Click 'Edit'
- Select 'Use retention rules defined in this space' from on top dropdown
- Select 'Keep by deleted date' from the dropdown below the Trash header
- Enter '5' in the textbox and select 'Days' from the dropdown.
- Save the retention rules
Navigate again to > General Configuration > Scheduled Jobs and trigger 'Trash Removal (Hard)' by clicking 'Run'
Observe the application logs and deleted item count with the below SQL query:
```
SELECT COUNT(CONTENTID)
FROM CONTENT c
WHERE c.CONTENT_STATUS = 'deleted'
```
There are 20k deleted items and at the end, the rule should delete all of them.

Expected Results

All the items in the trash (20k) should be deleted without issues or at least all the pages (19997) apart from the pages that have data integrity issues (3) should be deleted.

Job finishes with 3 records left to be deleted and cannot be deleted due to constraint issues.

Actual Results

Job finishes with 300 hundred records left to be deleted.

The below exception is thrown in the atlassian-confluence.log file:

2023-08-22 00:56:05,536 ERROR [Caesium-1-2] [engine.jdbc.spi.SqlExceptionHelper] logExceptions ERROR: update or delete on table "content" violates foreign key constraint "fk_notifications_content" on table "notifications"
  Detail: Key (contentid)=(111106) is still referenced from table "notifications".
 -- url: /c7195/setup/setupdata.action | traceId: 72e45ffdddf5ffd8 | userName: anonymous | action: setupdata
2023-08-22 00:56:05,536 ERROR [Caesium-1-2] [core.persistence.hibernate.HibernateObjectDao] unIndex Unable to index object: page: asd-6500 v.1 (111106) -- could not execute statement; SQL [n/a]; constraint [fk_notifications_content]; nested exception is org.hibernate.exception.ConstraintViolationException: could not execute statement
 -- url: /c7195/setup/setupdata.action | traceId: 72e45ffdddf5ffd8 | userName: anonymous | action: setupdata
org.springframework.dao.DataIntegrityViolationException: could not execute statement; SQL [n/a]; constraint [fk_notifications_content]; nested exception is org.hibernate.exception.ConstraintViolationException: could not execute statement
	[...]
	at com.atlassian.confluence.pages.AbstractPage.remove(AbstractPage.java:70)
	at com.atlassian.confluence.pages.Page.remove(Page.java:231)
	at com.atlassian.confluence.pages.DefaultTrashManager.deleteContentEntity(DefaultTrashManager.java:236)
	at com.atlassian.confluence.pages.DefaultTrashManager.lambda$purge$0(DefaultTrashManager.java:188)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
	at com.atlassian.confluence.pages.DefaultTrashManager.purge(DefaultTrashManager.java:188)
	[...]
	at com.atlassian.confluence.impl.retention.manager.DefaultTrashRemovalManager.deleteForRule(DefaultTrashRemovalManager.java:146)
	at com.atlassian.confluence.impl.retention.manager.DefaultTrashRemovalManager.lambda$cleanupTrashedEntities$5(DefaultTrashRemovalManager.java:164)
	at com.atlassian.confluence.impl.retention.analytics.TrashRemovalStatisticThreadLocal.withStatistic(TrashRemovalStatisticThreadLocal.java:23)
	at com.atlassian.confluence.impl.retention.manager.DefaultTrashRemovalManager.cleanupTrashedEntities(DefaultTrashRemovalManager.java:164)
	at com.atlassian.confluence.impl.retention.manager.DefaultTrashRemovalManager.lambda$hardRemove$1(DefaultTrashRemovalManager.java:107)
	at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140)
	at com.atlassian.confluence.impl.retention.manager.DefaultTrashRemovalManager.hardRemove(DefaultTrashRemovalManager.java:105)
	at com.atlassian.confluence.impl.retention.schedule.TrashHardRemovalScheduledJob.runJob(TrashHardRemovalScheduledJob.java:39)
	at com.atlassian.confluence.impl.schedule.caesium.JobRunnerWrapper.doRunJob(JobRunnerWrapper.java:117)
	[...]
Caused by: org.postgresql.util.PSQLException: ERROR: update or delete on table "content" violates foreign key constraint "fk_notifications_content" on table "notifications"
  Detail: Key (contentid)=(111106) is still referenced from table "notifications".
[...]
2023-08-22 00:56:05,539 ERROR [Caesium-1-2] [engine.jdbc.spi.SqlExceptionHelper] logExceptions ERROR: current transaction is aborted, commands ignored until end of transaction block
 -- url: /c7195/setup/setupdata.action | traceId: 72e45ffdddf5ffd8 | userName: anonymous | action: setupdata
2023-08-22 00:56:05,541 WARN [Caesium-1-2] [impl.retention.manager.DefaultTrashRemovalManager] hardRemove Error purging trash for batch offset=111070, limit=100
 -- url: /c7195/setup/setupdata.action | traceId: 72e45ffdddf5ffd8 | userName: anonymous | action: setupdata
	[...]

> grep "Error purging trash for batch offset" atlassian-confluence.log
2023-08-22 00:55:11,780 WARN [Caesium-1-2] [impl.retention.manager.DefaultTrashRemovalManager] hardRemove Error purging trash for batch offset=108770, limit=100
2023-08-22 00:56:05,541 WARN [Caesium-1-2] [impl.retention.manager.DefaultTrashRemovalManager] hardRemove Error purging trash for batch offset=111070, limit=100
2023-08-22 00:57:55,989 WARN [Caesium-1-2] [impl.retention.manager.DefaultTrashRemovalManager] hardRemove Error purging trash for batch offset=115570, limit=100

Workaround

Delete the page from the database manually by using the queries described in the 'How to Remove a Page Manually in the Database Using SQL Commands' knowledge base article and trigger the Retention rule again.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List

sqloutput.jpg
46 kB
21/Aug/2023 10:06 PM
trashdata-Confluence-space-export-003654-2.xml.zip
2.47 MB
21/Aug/2023 10:08 PM

is related to

CONFSERVER-87298 Soft and Hard Retention cleanup jobs take long time to run

Closed

CONFSERVER-93299 Retention rule jobs spread to every node in the multi-node DataCenter cluster

Closed

mentioned in: Page Failed to load; Page Failed to load; Page Failed to load; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(10 mentioned in)

Jeffery Xie added a comment - 18/Dec/2024 10:32 PM

Hi vmiloch@atlassian.com ,

We do have plan to backport this fix to previous LTS. This fix will be included in the changes of new retention rule (~~CONFSERVER-87298~~) and is still on soaking for 9.1. We will create backport tickets after that. cc ephillips@atlassian.com

Jeffery Xie added a comment - 18/Dec/2024 10:32 PM Hi vmiloch@atlassian.com , We do have plan to backport this fix to previous LTS. This fix will be included in the changes of new retention rule ( CONFSERVER-87298 ) and is still on soaking for 9.1. We will create backport tickets after that. cc ephillips@atlassian.com

Alexander added a comment - 03/Oct/2024 5:06 PM

Hopefully this fix will also be released for LTS. 🙈

Alexander added a comment - 03/Oct/2024 5:06 PM Hopefully this fix will also be released for LTS. 🙈

Jordan Anslow added a comment - 03/Oct/2024 7:36 AM

A fix for this issue is available in Confluence Data Center 9.1.0.
Upgrade now or check out the Release Notes to see what other issues are resolved.

Jordan Anslow added a comment - 03/Oct/2024 7:36 AM A fix for this issue is available in Confluence Data Center 9.1.0. Upgrade now or check out the Release Notes to see what other issues are resolved.

Mohamed Shariffdeen added a comment - 22/Aug/2023 1:30 PM

We are affected too, please fix it as soon as possible.

Mohamed Shariffdeen added a comment - 22/Aug/2023 1:30 PM We are affected too, please fix it as soon as possible.

Details

Description

Issue Summary

Steps to Reproduce

Expected Results

Actual Results

Workaround

Attachments

Attachments

Issue Links

Forms

Activity

Collapse comment: Jeffery Xie added a comment - 18/Dec/2024 10:32 PM

Expand comment: Jeffery Xie added a comment - 18/Dec/2024 10:32 PM

Collapse comment: Alexander added a comment - 03/Oct/2024 5:06 PM

Expand comment: Alexander added a comment - 03/Oct/2024 5:06 PM

Collapse comment: Jordan Anslow added a comment - 03/Oct/2024 7:36 AM

Expand comment: Jordan Anslow added a comment - 03/Oct/2024 7:36 AM

Collapse comment: Mohamed Shariffdeen added a comment - 22/Aug/2023 1:30 PM

Expand comment: Mohamed Shariffdeen added a comment - 22/Aug/2023 1:30 PM

People

Dates