Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-62835

Attachment Moves Are Non Atomic Resulting In Missing Attachments

      We don't plan to backport the fix for this bug to earlier Long Term Support versions

      The fix for this bug isn't suitable for backporting to a bug fix release for any previous LTS versions. This is often because the fix is considered too high risk to implement in an older version.

      The fix for this issue will be included in future Long Term Support versions.

      Issue Summary

      This ticket tracks a class of bugs wherein Confluence misplaces attachments as part of a page move. The attachments remain on disk, but due to being in the wrong part of the file tree, appear to be missing.

      This issue occurs during page or pagetree moves that are unexpectedly interrupted.

      Note this is not related to CONFSERVER-55928: Attachments become 'Unknown Attachment' in the page editor with Collaborative Editing turned on or its related bugs

      Steps to Reproduce

      There are currently multiple causes with differing reproduction steps.

      The fundamental cause is halting a pagetree copy whilst attachments are being moved.

      Expected Results

      The files should be moved successfully. In the case of a failed move, the file copies should be rolled back successfully.

      Actual Results

      The files are not moved correctly, or alternately, the files are not rolled back to their original location in the case of a failed page move.

      Workaround

      We currently have a script that searches for attachments that have been misplaced and moves them back to the correct location. The script can be found at https://confluence.atlassian.com/confkb/how-to-resolve-missing-attachments-in-confluence-201761.html

            [CONFSERVER-62835] Attachment Moves Are Non Atomic Resulting In Missing Attachments

            agawron added a comment -

            a7e9ce396f68 thank you for your feedback. The decision was made based on few factors.

            First of all the performance of the migration process. Most of operations usually go through NFS which is a bottle neck. Using only a file system move operation made it possible to have the migration process completed much faster. Calculating binary diff could significantly slow down the migration as well as it could take more memory risking crash. Imagine calculating a diff of 4GB video file!

            Secondly development time. We were not sure how many duplicates customers can have and how many of them are exactly the same.

            Thirdly, we didn't want to risk losing any file so we decided to limit ourselves to move operations only. Avoiding any deletes.

            Based on these we decided that duplicates can be safely handled by admins. Either left as they are, or reviewed in their time, not slowing down the migration.

            agawron added a comment - a7e9ce396f68 thank you for your feedback. The decision was made based on few factors. First of all the performance of the migration process. Most of operations usually go through NFS which is a bottle neck. Using only a file system move operation made it possible to have the migration process completed much faster. Calculating binary diff could significantly slow down the migration as well as it could take more memory risking crash. Imagine calculating a diff of 4GB video file! Secondly development time. We were not sure how many duplicates customers can have and how many of them are exactly the same. Thirdly, we didn't want to risk losing any file so we decided to limit ourselves to move operations only. Avoiding any deletes. Based on these we decided that duplicates can be safely handled by admins. Either left as they are, or reviewed in their time, not slowing down the migration.

            Why in the hell you are storing binary identical files as duplicate, that makes no sense and confuses us as customer. Sense it would make to only store found duplicates that are binary different as duplicates. With that we as customer would directly know for which files we have to perform a check which of the duplicates is the one we want to stay with and which the one we want to delete.

            Michael Mohr added a comment - Why in the hell you are storing binary identical files as duplicate, that makes no sense and confuses us as customer. Sense it would make to only store found duplicates that are binary different as duplicates. With that we as customer would directly know for which files we have to perform a check which of the duplicates is the one we want to stay with and which the one we want to delete.

            agawron added a comment - - edited

            39c389fcbf4a these .duplicate.X files are not referenced anywhere in the database. We keep the duplicate files just in case if any of them is actually the "real" attachment that should be used instead of the one that has been linked. If you are sure all these duplicate attachment files are just duplicates (exactly the same) then you can safely delete them. If you find a duplicate that is different than the original file you might want to double check if the right file has been linked during migration process.

            agawron added a comment - - edited 39c389fcbf4a  these .duplicate.X files are not referenced anywhere in the database. We keep the duplicate files just in case if any of them is actually the "real" attachment that should be used instead of the one that has been linked. If you are sure all these duplicate attachment files are just duplicates (exactly the same) then you can safely delete them. If you find a duplicate that is different than the original file you might want to double check if the right file has been linked during migration process.

            So, now that I've upgraded to Confluence Server 8.1.1, I have a bunch of duplicate.1 files in the attachements/v4 directories.  Can I just delete them without messing anything up or are their pointers to them in the confluence db?

            Dan Schwartz added a comment - So, now that I've upgraded to Confluence Server 8.1.1, I have a bunch of duplicate.1 files in the attachements/v4 directories.  Can I just delete them without messing anything up or are their pointers to them in the confluence db?

            A fix for this issue is available in Confluence Server and Data Center 8.1.0.
            Upgrade now or check out the Release Notes to see what other issues are resolved.

            Madhubabu Kethineni (Inactive) added a comment - A fix for this issue is available in Confluence Server and Data Center 8.1.0. Upgrade now or check out the Release Notes to see what other issues are resolved.

            Hi
            Do you already know when the fix will come?

            Marco Birrer added a comment - Hi Do you already know when the fix will come?

            ACP added a comment -

            Hi
            Do you already know when the fix will come?

            Had to fix the problem twice with the script. Which means a big downtime

            ACP added a comment - Hi Do you already know when the fix will come? Had to fix the problem twice with the script. Which means a big downtime

            Yan Zhou added a comment -

            v7.13.7  is also affected.

            Yan Zhou added a comment - v7.13.7  is also affected.

              d5dce7b13926 agawron
              jponting James Ponting
              Affected customers:
              15 This affects my team
              Watchers:
              40 Start watching this issue

                Created:
                Updated:
                Resolved: