Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-33898

Confluence index floods the temp folder with tmp files from pptx files

      Confluence reindex produces one com.atlassian.confluence.extra.officeconnector.index.powerpoint.PowerPointXMLTextExtractorxxxxxxxxxxxxxxxxxxx.tmp for each pptx file attached to the instance.

      Each reindex creates a new tmp files and leaves the older one there (conf_install/temp), without cleaning it. This can leave a huge garbage in confluence temp folder as we can see in the example below.

      The file used for this test was a empty pptx file and it's attached to this issue for testing purposes.

      Replication steps

      1. Install Confluence 5.7 standalone
      2. Login to Confluence 5.7
      3. Download the empty.pptx and save it on my desktop
      4. Create a new Space named test
      5. Click on the edit button on the home space to enter edit mode
      6. Drag the empty.pptx from my desktop to the space home page and then click on "Save" button
      7. Navigate to the folder <confluence_installation_directory>/temp
      8. You can see a new entry is created on Confluence installation temp folder

      Workaround

      Disable the index of PPT and PPTX files for now as stated here.

        1. empty.pptx
          31 kB
          Rodrigo Girardi Adami
        2. Screen Shot 2016-02-17 at 15.49.38.png
          572 kB
          Anton Shaleev
        3. Screen Shot 2016-02-17 at 15.54.30.png
          181 kB
          Anton Shaleev
        4. temp_files.png
          115 kB
          Rodrigo Girardi Adami

            [CONFSERVER-33898] Confluence index floods the temp folder with tmp files from pptx files

            @Zhenhuan Zhou if you need help knowledge sharing about the problem, feel free to contact me and I can fill you in

            Martin Aksel Jensen added a comment - @Zhenhuan Zhou if you need help knowledge sharing about the problem, feel free to contact me and I can fill you in

            Not really, disabling the office connector will disable office related features and extracting metadata for office files. I personally would rather recommend reading my earlier comment about requesting a patched office connector from us at Translucent ApS, an Atlassian platinum expert partner.

            We have provided a patched version for a couple of Atlassian customers, with good feedback. We hope for Atlassian to fix the office connector themselves, but this is an offer from us for customers where disabling the office connector is not an feasible solution.

            Martin Aksel Jensen added a comment - Not really, disabling the office connector will disable office related features and extracting metadata for office files. I personally would rather recommend reading my earlier comment about requesting a patched office connector from us at Translucent ApS, an Atlassian platinum expert partner. We have provided a patched version for a couple of Atlassian customers, with good feedback. We hope for Atlassian to fix the office connector themselves, but this is an offer from us for customers where disabling the office connector is not an feasible solution.

            Thanks Martin Aksel Jensen, for your quick answer.

            So, i deleted old temp files first (for safety) and my disk space is fine for now.
            But do you advise me to apply this proc : "https://confluence.atlassian.com/display/CONFKB/How+to+Disable+Indexing+of+Attachments" to avoid another floods on the temp folder ?

            Regards,

            Malcolm

            Malcolm Deau added a comment - Thanks Martin Aksel Jensen, for your quick answer. So, i deleted old temp files first (for safety) and my disk space is fine for now. But do you advise me to apply this proc : "https://confluence.atlassian.com/display/CONFKB/How+to+Disable+Indexing+of+Attachments" to avoid another floods on the temp folder ? Regards, Malcolm

            You can delete them safely, but you might first be able to after the Confluence instance has been restartet or is stopped to release the handles to files. But keep in mind as soon as you begin to reindex the pptx the same temp files will be created again and once again flood the drive.

            Martin Aksel Jensen added a comment - You can delete them safely, but you might first be able to after the Confluence instance has been restartet or is stopped to release the handles to files. But keep in mind as soon as you begin to reindex the pptx the same temp files will be created again and once again flood the drive.

            Hi guys,

            So if i get things straight, temp folder is filling my disk @100%.
            But, on the other hand, my "Search Index" in the "General Configuration/Content Indexing" menu, is @100% too.

            So, my question is : Can i delete all the pptx reindex files in /temp ?
            Or is it too dangerous for files integrity ?

            Thanks, Regards,

            Malcolm

            Malcolm Deau added a comment - Hi guys, So if i get things straight, temp folder is filling my disk @100%. But, on the other hand, my "Search Index" in the "General Configuration/Content Indexing" menu, is @100% too. So, my question is : Can i delete all the pptx reindex files in /temp ? Or is it too dangerous for files integrity ? Thanks, Regards, Malcolm

            I have examined the issue and found a couple of issues that causes this problem in the bundled Office Connector add-on when running Confluence on the Windows platform and we believe it to still be an issue on at least all recent versions of Confluence.

            We have successfully been able to create a patched version to mitigate the issue, and have confirmed that it does indeed stop the temporary files from flooding the harddrive.

            If you would like us to, we can provide you with the patch for free (you assume full responsibility for this). If you would like to, we can also assist you with patching Confluence.

            You can sign up here, and read some more in-depth details as well:

            http://products.translucent.dk/infinidex/atlassian-office-connector-patch/

            Best regards
            Translucent ApS - Creators of Infinidex for Confluence

            Martin Aksel Jensen added a comment - I have examined the issue and found a couple of issues that causes this problem in the bundled Office Connector add-on when running Confluence on the Windows platform and we believe it to still be an issue on at least all recent versions of Confluence. We have successfully been able to create a patched version to mitigate the issue, and have confirmed that it does indeed stop the temporary files from flooding the harddrive. If you would like us to, we can provide you with the patch for free (you assume full responsibility for this). If you would like to, we can also assist you with patching Confluence. You can sign up here, and read some more in-depth details as well: http://products.translucent.dk/infinidex/atlassian-office-connector-patch/ Best regards Translucent ApS - Creators of Infinidex for Confluence

            April added a comment -

            Hi there,

            I have 5.8.10 in Prod, and 5.9.5 in staging.

            In Prod, the folder \confluence-5.8.10\temp holds 15,360 files, and is 11.3 gig.

            Oldest files are from September of last year (last upgrade date), and the references are ancient.

            In staging, I tried clearing this folder manually, and got error like:

            2016-02-25 13:51:30,603 WARN [Indexer: 4] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: RT Structured Products and Custom Basketrs EU v1.1.DOC v.1 (455) bdaleiden)
            – referer: https://stgconfluence.ezesoft.net/admin/search-indexes.action | url: /admin/reindex.action | userName: adaly | action: reindex
            com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word binary document: Can't open the specified file: 'D:\confluence\confluence-5.9.5\temp\poifiles\poi-ooxml--1422838394.tmp'

            This file was attached to a Confluence page in August of 2007!

            I have to ask... how temporary is this "temp" folder, and why would the indexer even be looking here, instead of in \data\attachments\ver003?

            Thanks,
            April

            April added a comment - Hi there, I have 5.8.10 in Prod, and 5.9.5 in staging. In Prod, the folder \confluence-5.8.10\temp holds 15,360 files, and is 11.3 gig. Oldest files are from September of last year (last upgrade date), and the references are ancient. In staging, I tried clearing this folder manually, and got error like: 2016-02-25 13:51:30,603 WARN [Indexer: 4] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: RT Structured Products and Custom Basketrs EU v1.1.DOC v.1 (455) bdaleiden) – referer: https://stgconfluence.ezesoft.net/admin/search-indexes.action | url: /admin/reindex.action | userName: adaly | action: reindex com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word binary document: Can't open the specified file: 'D:\confluence\confluence-5.9.5\temp\poifiles\poi-ooxml--1422838394.tmp' This file was attached to a Confluence page in August of 2007! I have to ask... how temporary is this "temp" folder, and why would the indexer even be looking here, instead of in \data\attachments\ver003? Thanks, April

            Der Lun added a comment -

            Hi dunterwurzacher,

            I would need to reopen this bug report as it seems that the underlying issue is not fixed as I am able to replicate the issue by attaching the .pptx that is provided in this bug report.

            These are my replication steps :

            1. Install Confluence 5.7 standalone
            2. Login to Confluence 5.7
            3. Download the empty.pptx and save it on my desktop
            4. Create a new Space named test
            5. Click on the edit button on the home space to enter edit mode
            6. Drag the empty.pptx from my desktop to the space home page and then click on "Save" button
            7. Navigate to the folder <confluence_installation_directory>/temp
            8. A new entry is created on Confluence

            Please note that I can replicate this issue on Confluence 5.7.1, 5.7.5 and 5.8.15 with the exact same replication step.

            Please let me know if you need additional information in regards to this bug.

            Regards,
            Der Lun

            Der Lun added a comment - Hi dunterwurzacher , I would need to reopen this bug report as it seems that the underlying issue is not fixed as I am able to replicate the issue by attaching the .pptx that is provided in this bug report. These are my replication steps : Install Confluence 5.7 standalone Login to Confluence 5.7 Download the empty.pptx and save it on my desktop Create a new Space named test Click on the edit button on the home space to enter edit mode Drag the empty.pptx from my desktop to the space home page and then click on "Save" button Navigate to the folder <confluence_installation_directory>/temp A new entry is created on Confluence Please note that I can replicate this issue on Confluence 5.7.1, 5.7.5 and 5.8.15 with the exact same replication step. Please let me know if you need additional information in regards to this bug. Regards, Der Lun

            Hi Ryan. I was not able to reproduce this issue on 5.7.5, and we changed the way indexing works under the hood, so it should be resolved. If you delete the temporary files and new ones get created, would you be able to raise a Support Request, and get the team to have a look through it with you? We can certainly re-open this bug if we're able to reproduce the issue on 5.7 or later. Thanks for letting us know!

            Denise Unterwurzacher [Atlassian] (Inactive) added a comment - Hi Ryan. I was not able to reproduce this issue on 5.7.5, and we changed the way indexing works under the hood, so it should be resolved. If you delete the temporary files and new ones get created, would you be able to raise a Support Request , and get the team to have a look through it with you? We can certainly re-open this bug if we're able to reproduce the issue on 5.7 or later. Thanks for letting us know!

            I am actually on version 5.7.1 and still have this issue with PowerPoint files in my environment. Are we quite certain this issue is resolved or is it still necessary to disable the indexing for PowerPoint attachments?

            Deleted Account (Inactive) added a comment - I am actually on version 5.7.1 and still have this issue with PowerPoint files in my environment. Are we quite certain this issue is resolved or is it still necessary to disable the indexing for PowerPoint attachments?

              zzhou Zhenhuan Zhou (Inactive)
              rgadami Rodrigo Girardi Adami
              Affected customers:
              31 This affects my team
              Watchers:
              52 Start watching this issue

                Created:
                Updated:
                Resolved: