Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-9335

In cluster, allow attachments to be stored on file system in network-shared directory

    • 12
    • We collect Confluence feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

      NOTE: This suggestion is for Confluence Server. Using Confluence Cloud? See the corresponding suggestion.

      If the confluence-home directory is set to a network shared volume, would it be possible to configure the attachments to be stored there, for a clustered configuration? (The customer requesting this feature is unhappy to have to store all attachments in the database.)

            [CONFSERVER-9335] In cluster, allow attachments to be stored on file system in network-shared directory

            Hi Stephen,

            We are currently working on a completely new version of our clustered offering that provides greater scalability as well as High Availability.

            If you are interested in joining our closed beta program for the new offering, please, send an email to kelvin@atlassian.com with the subject: Confluence Clustered and Kelvin will get back to you ASAP.

            Thanks.

            Paul Curren added a comment - Hi Stephen, We are currently working on a completely new version of our clustered offering that provides greater scalability as well as High Availability. If you are interested in joining our closed beta program for the new offering, please, send an email to kelvin@atlassian.com with the subject: Confluence Clustered and Kelvin will get back to you ASAP. Thanks.

            Keeping the attachments in the file system would be awesome. This has been a problem for us for the last 6 years and prevented us from using a clustered version of Confluence.

            When is this version available and where can we find out more about Confluence Clustering for version 5.6?

            Stephen Gramm added a comment - Keeping the attachments in the file system would be awesome. This has been a problem for us for the last 6 years and prevented us from using a clustered version of Confluence. When is this version available and where can we find out more about Confluence Clustering for version 5.6?

            Is that true? So great news! @Paul Curren [Atlassian]

            烨 王 added a comment - Is that true? So great news! @Paul Curren [Atlassian]

            The clustered version of Confluence 5.6 will include the concept of a shared home which will, amongst other things, house the attachments.

            There will be more details included in the 5.6 release notes.

            Paul Curren added a comment - The clustered version of Confluence 5.6 will include the concept of a shared home which will, amongst other things, house the attachments. There will be more details included in the 5.6 release notes.

            what a bad situation!

            烨 王 added a comment - what a bad situation!

            This issue is old enough to go to kindergarten (4 1/2 years). I'm sure lots of other 4 yr old+ issues get better jokes than that, I'll work on a better one for the 5 yr anniversary.

            In all seriousness, we still desire that this issue be addressed. Atlassian, can you give us any insight on the variables of this request that have prevented it from being worked?

            David Foust added a comment - This issue is old enough to go to kindergarten (4 1/2 years). I'm sure lots of other 4 yr old+ issues get better jokes than that, I'll work on a better one for the 5 yr anniversary. In all seriousness, we still desire that this issue be addressed. Atlassian, can you give us any insight on the variables of this request that have prevented it from being worked?

            Hi team,
            we have the same problem in cluster mode.

            Best Regards,
            Roberto

            Roberto De Giuseppe added a comment - Hi team, we have the same problem in cluster mode. Best Regards, Roberto

            Just added my vote. We are running a 24x7 installation with four nodes!

            Having the attachments in the DB is a real pain.

            Thomas Krug added a comment - Just added my vote. We are running a 24x7 installation with four nodes! Having the attachments in the DB is a real pain.

            You would think that this request could at least be assigned after more than two years. Please assign the task or tell us whether or not it is going to be worked at all.

            Stephen Gramm added a comment - You would think that this request could at least be assigned after more than two years. Please assign the task or tell us whether or not it is going to be worked at all.

            Somebody knows why is it possible to use a single node Confluence with a NFS file system storage and not in a clustered envinronment!?

            We already use NFS (78gb) in a single node installation. Why isn't possible mount the same point in all cluster nodes!? Why database!?

            2000+ users in a single node, we are VERY concerned about that!

            We have another products with this same environment (Weblogic Cluster + NAS) without problems.

            Bruno Domenici added a comment - Somebody knows why is it possible to use a single node Confluence with a NFS file system storage and not in a clustered envinronment!? We already use NFS (78gb) in a single node installation. Why isn't possible mount the same point in all cluster nodes!? Why database!? 2000+ users in a single node, we are VERY concerned about that! We have another products with this same environment (Weblogic Cluster + NAS) without problems.

            Ed Gibson added a comment -

            Kevin

            Not sure the issue your describing directly relates to this one. My percpeption from the text was that your dealing with an NFS issue with attachments?

            The focus of this issue is to highlight that large environments that desire additional resiliency through the implementation the cluster product. Unfortunately the cluster solution only supports attachments housed within the database rather then attachments stored outside the database. Storing all attachements within the database represents significant database bloat that I feel has scaling challenges.

            My assumption is that you are running the non-clustered version and struggling with an NFS compatibility.

            Interesting enough we are looking at NetApp as a NAS solution. I'd be interested in communicating off channel about your experience with that product.

            Ed <egibson@uwo.ca>

            Ed Gibson added a comment - Kevin Not sure the issue your describing directly relates to this one. My percpeption from the text was that your dealing with an NFS issue with attachments? The focus of this issue is to highlight that large environments that desire additional resiliency through the implementation the cluster product. Unfortunately the cluster solution only supports attachments housed within the database rather then attachments stored outside the database. Storing all attachements within the database represents significant database bloat that I feel has scaling challenges. My assumption is that you are running the non-clustered version and struggling with an NFS compatibility. Interesting enough we are looking at NetApp as a NAS solution. I'd be interested in communicating off channel about your experience with that product. Ed <egibson@uwo.ca>

            Hi:

            We found this out the hard way as well - we wanted to NFS mount our attachments directory in JIRA from a NetApp. It works fine from a NFS point-of-view, but JIRA fails to manage the attachment. There is a nasty and not very useful message generated about permission denied.

            On a whim, we decided to do a CIFS mount of the volume. This worked perfectly.

            JIRA runs on our Linux box. This now allows us to be able to use the NetApp with it's better storage management facilities to manage attachments. It also means that we do not have to email attachments to the system that manages our deployments - it will simply be sent an email with the UNC pathname. The deploymnt team can access the files to deploy from the CIFS share from the NetApp.

            This may be a solution for anyone else who has a NetApp or any other mass storage device that can talk CIFS.

            kjc

            Kevin J. Conway added a comment - Hi: We found this out the hard way as well - we wanted to NFS mount our attachments directory in JIRA from a NetApp. It works fine from a NFS point-of-view, but JIRA fails to manage the attachment. There is a nasty and not very useful message generated about permission denied. On a whim, we decided to do a CIFS mount of the volume. This worked perfectly. JIRA runs on our Linux box. This now allows us to be able to use the NetApp with it's better storage management facilities to manage attachments. It also means that we do not have to email attachments to the system that manages our deployments - it will simply be sent an email with the UNC pathname. The deploymnt team can access the files to deploy from the CIFS share from the NetApp. This may be a solution for anyone else who has a NetApp or any other mass storage device that can talk CIFS. kjc

            AudraA added a comment -

            Hi all,
            A Confluence developer recently worked on a prototype for this issue, but we have not selected a release time frame for it. We are working on hiring more developers to focus on enterprise features, so we should be addressing these types of features this year. We did not get a chance to look at this feature last year, as we focused on improving clustered performance in 3.0 and for a smaller release: 3.1 we focused on editing experience improvements like drag and drop, image browser, and office 2007 support. See the release notes: http://confluence.atlassian.com/display/DOC/Confluence+3.1+Release+Notes.

            Right now, we are focusing on delivering a new revamped RTE by June, and afterwards we'll consider enterprise features like this for future releases. We understand the pain you are feeling, and agree that this is a high priority enterprise feature.

            Best regards,
            Audra Eng

            AudraA added a comment - Hi all, A Confluence developer recently worked on a prototype for this issue, but we have not selected a release time frame for it. We are working on hiring more developers to focus on enterprise features, so we should be addressing these types of features this year. We did not get a chance to look at this feature last year, as we focused on improving clustered performance in 3.0 and for a smaller release: 3.1 we focused on editing experience improvements like drag and drop, image browser, and office 2007 support. See the release notes: http://confluence.atlassian.com/display/DOC/Confluence+3.1+Release+Notes . Right now, we are focusing on delivering a new revamped RTE by June, and afterwards we'll consider enterprise features like this for future releases. We understand the pain you are feeling, and agree that this is a high priority enterprise feature. Best regards, Audra Eng

            What does it take to get the Program Manager to post an update on this ticket. Please escalate to the Program Manager as this should really be a simple task.

            Stephen Gramm added a comment - What does it take to get the Program Manager to post an update on this ticket. Please escalate to the Program Manager as this should really be a simple task.

            Hello, just wondering if Atlassian has any updates on this issue?

            David Foust added a comment - Hello, just wondering if Atlassian has any updates on this issue?

            Igor Minar added a comment -

            Would the folks running large confluence instances be interested in joining a mailing list for discussing confluence issues? I think it would be beneficial for all of us to exchange information about our problems and solutions. I created a mailing list here: http://groups.google.com/group/enterprise-confluence I'm looking mainly for technical discussion.. no marketing, end user help desk, ...

            I strongly believe that even though Atlassian doesn't recognize large customers as interesting, that when we pull our forces together we can come up with solutions faster and maybe even have a leg up when dealing with Atlassian.

            Any takers?

            And just to be clear by large confluence instance I mean hundreds of spaces, thousands of pages, tens or hundreds of thousands of page revisions and tens of thousands of users.

            Igor Minar added a comment - Would the folks running large confluence instances be interested in joining a mailing list for discussing confluence issues? I think it would be beneficial for all of us to exchange information about our problems and solutions. I created a mailing list here: http://groups.google.com/group/enterprise-confluence I'm looking mainly for technical discussion.. no marketing, end user help desk, ... I strongly believe that even though Atlassian doesn't recognize large customers as interesting, that when we pull our forces together we can come up with solutions faster and maybe even have a leg up when dealing with Atlassian. Any takers? And just to be clear by large confluence instance I mean hundreds of spaces, thousands of pages, tens or hundreds of thousands of page revisions and tens of thousands of users.

            Adding to Ed Gibson's list:

            Brown University
            University of Western Ontario
            MIT
            Bob Jones University
            Internet Broadcasting Systems
            NASA
            AUDI AG

            We haven't even made our wiki service widely advertised on campus yet, and I have over 280 spaces, almost 10,000 pages and over 20,000 users. Our database is under 1 GB, but our attachments are almost 9 GB. I can't even imagine what will happen when we start advertising the service and suggesting that people use it. We will definitely need high availability, but there's no way we can be sticking that much attachment data into our database.

            And to echo David comment: "BIG companies need BIG wikis and have BIG wallets. I never thought I would be complaining of low prices as a customer. But if it made a difference, we would pay for it. "

            We saw the same thing, especially with regard to support. We asked for 24x7 support, and offered to pay more, but are told they're not ready for that.

            Patrick Laverty added a comment - Adding to Ed Gibson's list: Brown University University of Western Ontario MIT Bob Jones University Internet Broadcasting Systems NASA AUDI AG We haven't even made our wiki service widely advertised on campus yet, and I have over 280 spaces, almost 10,000 pages and over 20,000 users. Our database is under 1 GB, but our attachments are almost 9 GB. I can't even imagine what will happen when we start advertising the service and suggesting that people use it. We will definitely need high availability, but there's no way we can be sticking that much attachment data into our database. And to echo David comment: "BIG companies need BIG wikis and have BIG wallets. I never thought I would be complaining of low prices as a customer. But if it made a difference, we would pay for it. " We saw the same thing, especially with regard to support. We asked for 24x7 support, and offered to pay more, but are told they're not ready for that.

            Igor Minar added a comment -

            AFICT Peter's storage solution is a complete replacement of the attachment functionality. Which works great for them, but that's not what most of us our looking for. We really want the attachment functionality in Confluence to be preserved, but the storage strategy to be different.

            Igor Minar added a comment - AFICT Peter's storage solution is a complete replacement of the attachment functionality. Which works great for them, but that's not what most of us our looking for. We really want the attachment functionality in Confluence to be preserved, but the storage strategy to be different.

            Peter Reiser of Sun Microsystems developed a custom solution to do clustering with external attachment storage. I've heard of some other confluence user who has done that too.

            However, Sun said that they would not make their solution publicly available.

            Alice Gifford added a comment - Peter Reiser of Sun Microsystems developed a custom solution to do clustering with external attachment storage. I've heard of some other confluence user who has done that too. However, Sun said that they would not make their solution publicly available.

            Igor Minar added a comment -

            How about storing attachments in a S3 (like) external storage? That would be a good alternative solution and a lot more efficient that a shared fs storage. - just a thought.

            Igor Minar added a comment - How about storing attachments in a S3 (like) external storage? That would be a good alternative solution and a lot more efficient that a shared fs storage. - just a thought.

            AudraA added a comment -

            David and all, thank you for continuing to comment and push this issue. We are reviewing enterprise customer feature requests currently to assess what is the next major features we will add in the product next. This may be a long-term roadmap item, if we decide that other scalability, performance, or stability issues should be addressed first. I'll continue to watch this issue and provide updates based on our roadmap planning.

            If you're interested to know how we decide on which features to implement, please read this:
            http://confluence.atlassian.com/display/DEV/Implementation+of+New+Features+and+Improvements

            AudraA added a comment - David and all, thank you for continuing to comment and push this issue. We are reviewing enterprise customer feature requests currently to assess what is the next major features we will add in the product next. This may be a long-term roadmap item, if we decide that other scalability, performance, or stability issues should be addressed first. I'll continue to watch this issue and provide updates based on our roadmap planning. If you're interested to know how we decide on which features to implement, please read this: http://confluence.atlassian.com/display/DEV/Implementation+of+New+Features+and+Improvements

            David - thanks for the quick feedback. I can confirm the issue exists in reverse. Once we got past that, though, it fails with different errors. We're really starting to wonder if it's ever worked...

            Peter Raymond added a comment - David - thanks for the quick feedback. I can confirm the issue exists in reverse. Once we got past that, though, it fails with different errors. We're really starting to wonder if it's ever worked...

            Peter: I have not tried DB->FS, ....but I do know of an issue with FS->DB, which may be the same thing. When going FS->DB, the migration tries to move all files as a single transaction. If your temp space on your db is smaller than your total attachments size, you are out of luck. See issue CONF-9888

            David Foust added a comment - Peter: I have not tried DB->FS, ....but I do know of an issue with FS->DB, which may be the same thing. When going FS->DB, the migration tries to move all files as a single transaction. If your temp space on your db is smaller than your total attachments size, you are out of luck. See issue CONF-9888

            Here's a heads up for everyone - we have our attachments in a DB due to the clustering requirement but have come to the realization that clustering is hurting us more than it's helping. However, every attempt to migrate attachments back out of the DB and into the file system is met in failure. We have an open case going but are curious if anyone here has successfully moved their attachments out of the DB before, even just a test in your QA or DEV system.

            Sort of related...

            Thanks.

            Peter

            Peter Raymond added a comment - Here's a heads up for everyone - we have our attachments in a DB due to the clustering requirement but have come to the realization that clustering is hurting us more than it's helping. However, every attempt to migrate attachments back out of the DB and into the file system is met in failure. We have an open case going but are curious if anyone here has successfully moved their attachments out of the DB before, even just a test in your QA or DEV system. Sort of related... Thanks. Peter

            Atlassian, it has been 6 months since the last "No you can't have it" update. Can we have our next "No you can't have it" ?

            In all seriousness, this issue is 1 1/2 years old, and still is ZERO priority to Atlassian. Just guessing here, but I think we probably have a least 1,000,000 users affected across the 31 votes on this issue. The only alternative that has been offered is "do it yourself" which voids our support. Not an option.

            When do we finally reach critical mass and get put on the radar at least?

            David Foust added a comment - Atlassian, it has been 6 months since the last "No you can't have it" update. Can we have our next "No you can't have it" ? In all seriousness, this issue is 1 1/2 years old, and still is ZERO priority to Atlassian. Just guessing here, but I think we probably have a least 1,000,000 users affected across the 31 votes on this issue. The only alternative that has been offered is "do it yourself" which voids our support. Not an option. When do we finally reach critical mass and get put on the radar at least?

            We at BT have 200GB of attatchments, have a 4 node cluster and 45000 users. We need SAN file clustering

            Alex Fishlock added a comment - We at BT have 200GB of attatchments, have a 4 node cluster and 45000 users. We need SAN file clustering

            Here at The Children's Hospital of Philadelphia, we have many researchers who deal with extremely large attachments. They would like (and even say they need) to be able to access these attachments through our Confluence instance for it to be truly useful to them. Because of performance/database size concerns, however, we have had to restrict the size of the attachments that can be uploaded, and this has hindered adoption. Because of this, we were just starting to look into storing attachments outside of the database (preferably on our already-existent SAN), and am very disappointed to see the state of things. We were just about to migrate our Confluence instance to a cluster within the next month or so, but it looks like if we do this, we will not be able to use disk storage.

            We will need to take a serious look at how worth it is it to have a clustered environment vs. giving people the ability they need to access large attachments through the system.

            Like others, I ask Atlassian to reconsider the priority of this issue. I think offering this as a feature would greatly increase the interest of large institutions in Confluence, offering more revenue for Atlassian. The long delay in implementing this might already be leading such institutions to look at other products.

            Kristyn Souder added a comment - Here at The Children's Hospital of Philadelphia, we have many researchers who deal with extremely large attachments. They would like (and even say they need) to be able to access these attachments through our Confluence instance for it to be truly useful to them. Because of performance/database size concerns, however, we have had to restrict the size of the attachments that can be uploaded, and this has hindered adoption. Because of this, we were just starting to look into storing attachments outside of the database (preferably on our already-existent SAN), and am very disappointed to see the state of things. We were just about to migrate our Confluence instance to a cluster within the next month or so, but it looks like if we do this, we will not be able to use disk storage. We will need to take a serious look at how worth it is it to have a clustered environment vs. giving people the ability they need to access large attachments through the system. Like others, I ask Atlassian to reconsider the priority of this issue. I think offering this as a feature would greatly increase the interest of large institutions in Confluence, offering more revenue for Atlassian. The long delay in implementing this might already be leading such institutions to look at other products.

            I'll also toss in another 2¢ - some really big companies, whose names shall stay anonymous by request, running really big Confluence instances have independently come to the conclusion that a single node system is better than running a clustered version, both in terms of stability and performance. And by big instance I'm referring to current pages well over 100,000 and users well over 50,000...

            Peter Raymond added a comment - I'll also toss in another 2¢ - some really big companies, whose names shall stay anonymous by request, running really big Confluence instances have independently come to the conclusion that a single node system is better than running a clustered version, both in terms of stability and performance. And by big instance I'm referring to current pages well over 100,000 and users well over 50,000...

            Igor Minar added a comment -

            Alice,

            I've been customizing confluence for year and a half now and one thing I learn from this experience is that you don't want to create a customization that will change a fundamental behavior of confluence. Because if you do, you'll have problems upgrading and you might get stuck with an old version of confluence that might contain bugs or security vulnerabilities.

            I'm afraid that no partner will help anyone to get this feature implemented properly without adding upgrade constraints and/or overhead.

            Also I wouldn't be surprised if Atlassian had objections when it comes to supporting Confluence modified in this way.

            Igor Minar added a comment - Alice, I've been customizing confluence for year and a half now and one thing I learn from this experience is that you don't want to create a customization that will change a fundamental behavior of confluence. Because if you do, you'll have problems upgrading and you might get stuck with an old version of confluence that might contain bugs or security vulnerabilities. I'm afraid that no partner will help anyone to get this feature implemented properly without adding upgrade constraints and/or overhead. Also I wouldn't be surprised if Atlassian had objections when it comes to supporting Confluence modified in this way.

            I think it means: If a company wants this feature, they will have to pay one of the partners to develop it for them. Atlassian won't do it for you.

            And if the companies are big enough, and very tied into confluence, maybe someone will pay the partner.

            I know that for our biggest project (so far) that uses Confluence, there were 2K users who tried to register the first day it was available. And our servers got overwhelmed.

            Alice Gifford added a comment - I think it means: If a company wants this feature, they will have to pay one of the partners to develop it for them. Atlassian won't do it for you. And if the companies are big enough, and very tied into confluence, maybe someone will pay the partner. I know that for our biggest project (so far) that uses Confluence, there were 2K users who tried to register the first day it was available. And our servers got overwhelmed.

            Igor Minar added a comment -

            +1 vote from Sun representing 53+k registered users.

            It's soon going to be 6 months since Adnan comment mentioning reconsideration of this issue's priority. I hope that the number of users impacted is going to be more important when considering this issue than the number of votes.

            btw I don't really get the reference to Atlassian's partners. Will they magically add the lacking fundamental feature to Confluence? I spoke to several of these partners and all I heard was do not use confluence cluster. Correct me if I'm wrong but that doesn't sound very reassuring when talking about an "enterprise" product.

            Igor Minar added a comment - +1 vote from Sun representing 53+k registered users. It's soon going to be 6 months since Adnan comment mentioning reconsideration of this issue's priority. I hope that the number of users impacted is going to be more important when considering this issue than the number of votes. btw I don't really get the reference to Atlassian's partners. Will they magically add the lacking fundamental feature to Confluence? I spoke to several of these partners and all I heard was do not use confluence cluster . Correct me if I'm wrong but that doesn't sound very reassuring when talking about an "enterprise" product.

            Alice Gifford added a comment - - edited

            By not even speculating about when you might consider doing this, you are cutting yourself off from a lot of large, potentially lucrative clients. Places that don't already have a Confluence installation would choose something else if they needed a big wiki. I noticed that most of the well-known wiki's seem to be running MediaWiki.

            While this issue doesn't have a lot of votes, maybe some of the voter would collectively be willing to put up some money towards the development of clustering. Have Atlassian negotiate contracts with one or more of the providers, and have the customers put up part of the money, and have Atlassian put up part of the money. Or even have the provider contribute some of the services for free and get a cut of sales of clustering products later.

            Then set something up for the pricing of shared directory clustering for companies that didn't contribute to the development.

            The first 6 months, the license is $100k per company plus the 4k or 8k per node. The second 6 months it is $50K per company. After the first year, it drops again...

            But don't announce in advance how much it will drop or when it will drop. So that companies who really need it will look at the price now, and go for it. Or have sites that think they might need clustering at some point in the future go in for basic Confluence.

            Even if you have no immediate plans to implement this, you should take a look at what might be involved and try not to move in incompatible directions.

            I'm throwing out the suggestion, but I'd like to caveat that I'm about 5 layers below anyone who could make that decision, and I'm just a subcontractor.

            And they probably wouldn't be willing to make a decision and put through the purchasing until shortly before it became an emergency.

            However, the largest wiki we have planned doesn't require the features (LDAP integration) that drove us to choose Confluence. We haven't implemented anything yet either.

            Alice Gifford added a comment - - edited By not even speculating about when you might consider doing this, you are cutting yourself off from a lot of large, potentially lucrative clients. Places that don't already have a Confluence installation would choose something else if they needed a big wiki. I noticed that most of the well-known wiki's seem to be running MediaWiki. While this issue doesn't have a lot of votes, maybe some of the voter would collectively be willing to put up some money towards the development of clustering. Have Atlassian negotiate contracts with one or more of the providers, and have the customers put up part of the money, and have Atlassian put up part of the money. Or even have the provider contribute some of the services for free and get a cut of sales of clustering products later. Then set something up for the pricing of shared directory clustering for companies that didn't contribute to the development. The first 6 months, the license is $100k per company plus the 4k or 8k per node. The second 6 months it is $50K per company. After the first year, it drops again... But don't announce in advance how much it will drop or when it will drop. So that companies who really need it will look at the price now, and go for it. Or have sites that think they might need clustering at some point in the future go in for basic Confluence. Even if you have no immediate plans to implement this, you should take a look at what might be involved and try not to move in incompatible directions. I'm throwing out the suggestion, but I'd like to caveat that I'm about 5 layers below anyone who could make that decision, and I'm just a subcontractor. And they probably wouldn't be willing to make a decision and put through the purchasing until shortly before it became an emergency. However, the largest wiki we have planned doesn't require the features (LDAP integration) that drove us to choose Confluence. We haven't implemented anything yet either.

            What gloomy update! I appreciate your candor on the issue Adnan. However, I would plea for reconsideration on behalf of everyone here. The "Confluence Massive" clustering product is not part of the core installation of Confluence, is purchased separately. As such, why do enhancements to the clustering product have to be prioritized against the single node product? Obviously you will always have 100 to 1 or more single node installations versus multi node.

            We at NASA are really in a conundrum here on this. We really need the redundancy of multiple nodes. However, we do not want to "waste" 100s of GB of tier-1 storage on file attachments. Also, there are performance concerns on database impact. Regardless, even if we were to decide to go forward, we can't due to issue CONF-9888 (attachment migrations from file system to the db import all files as a single transaction). This prevents us from migrating, as our temp space in our database fills up. This is really puzzling, as the whole purpose of the "Massive" cluster product is to better serve large installations. But in our case, we have become too large to move to it! Does this make sense?

            I'm not trying to poke at Atlassian with a stick, but with CONF-9335 and CONF-9888 not being scheduled for resolution, it appears (to me) as if Atlassian has abandoned the cluster product. If Atlassian feels the cluster product doesn't produce enough revenue to make it's issues a priority, perhaps you should raise the price of the cluster product. Charging very little for it, then providing few bug fixes and enhancements as critical as this is a poor path forward.

            BIG companies need BIG wikis and have BIG wallets. I never thought I would be complaining of low prices as a customer. But if it made a difference, we would pay for it.

            David Foust added a comment - What gloomy update! I appreciate your candor on the issue Adnan. However, I would plea for reconsideration on behalf of everyone here. The "Confluence Massive" clustering product is not part of the core installation of Confluence, is purchased separately. As such, why do enhancements to the clustering product have to be prioritized against the single node product? Obviously you will always have 100 to 1 or more single node installations versus multi node. We at NASA are really in a conundrum here on this. We really need the redundancy of multiple nodes. However, we do not want to "waste" 100s of GB of tier-1 storage on file attachments. Also, there are performance concerns on database impact. Regardless, even if we were to decide to go forward, we can't due to issue CONF-9888 (attachment migrations from file system to the db import all files as a single transaction). This prevents us from migrating, as our temp space in our database fills up. This is really puzzling, as the whole purpose of the "Massive" cluster product is to better serve large installations. But in our case, we have become too large to move to it! Does this make sense? I'm not trying to poke at Atlassian with a stick, but with CONF-9335 and CONF-9888 not being scheduled for resolution, it appears (to me) as if Atlassian has abandoned the cluster product. If Atlassian feels the cluster product doesn't produce enough revenue to make it's issues a priority, perhaps you should raise the price of the cluster product. Charging very little for it, then providing few bug fixes and enhancements as critical as this is a poor path forward. BIG companies need BIG wikis and have BIG wallets. I never thought I would be complaining of low prices as a customer. But if it made a difference, we would pay for it.

            We are aware of the pain that many people are feeling because of this issue.

            Unfortunately, we have no current plans in the next year of development to look at solving this issue. The reasons for this are that:
            1. It is not relatively a high priority for us considering other, more pressing development needs across our customer base.
            2. The degree of complexity and resources required to implement the right solution here is high and make it harder to fit into our roadmap quickly.

            This is not an easy decision to make. However, we have a wide range of customers and we must balance the needs of all customers when choosing how to use our always scarce development resources.

            This issue will be reconsidered in 6 months time.

            I'd second contacting one of our excellent partners, as Todd mentions above, to look at your installations in their particular contexts with a view to improving scalability.

            Confluence Product Manager

            Adnan Chowdhury [Atlassian] added a comment - We are aware of the pain that many people are feeling because of this issue. Unfortunately, we have no current plans in the next year of development to look at solving this issue. The reasons for this are that: 1. It is not relatively a high priority for us considering other, more pressing development needs across our customer base. 2. The degree of complexity and resources required to implement the right solution here is high and make it harder to fit into our roadmap quickly. This is not an easy decision to make. However, we have a wide range of customers and we must balance the needs of all customers when choosing how to use our always scarce development resources. This issue will be reconsidered in 6 months time. I'd second contacting one of our excellent partners, as Todd mentions above, to look at your installations in their particular contexts with a view to improving scalability. Confluence Product Manager

            Add another Fortune 100 company to the list. In our case, our DB has hit 40GB and it's too big for us to copy over into our QA environment so that we can do a trial upgrade to the next version! Ouch.

            Peter Raymond added a comment - Add another Fortune 100 company to the list. In our case, our DB has hit 40GB and it's too big for us to copy over into our QA environment so that we can do a trial upgrade to the next version! Ouch.

            We understand the level of frustration on this issue. Unfortunately, Atlassian does not have an internal Professional Services team to address it. However, we do have a excellent team of partners that provide our customers with top notch professional services. If you would like a list of these partners, please feel free to contact me directly at partners@atlassian.com.

            Todd Revolt [Atlassian] added a comment - We understand the level of frustration on this issue. Unfortunately, Atlassian does not have an internal Professional Services team to address it. However, we do have a excellent team of partners that provide our customers with top notch professional services. If you would like a list of these partners, please feel free to contact me directly at partners@atlassian.com.

            Ed Gibson added a comment -

            Perhaps to extend the concerns expressed by David Foust. Can you provide some context as to how the prioritization of issues is deteremined.

            I noticed that this issue has been assigned 21 votes. Is the prioritization based linearily on the number of votes applied?

            When corporations like:
            University of Western Ontario
            MIT
            Bob Jones University
            Internet Broadcasting Systems
            NASA
            AUDI AG

            Feel passionate enough about this issue to not only vote but also take the time to express comments on the issue that we could catch some attention. Particularily when were indicating that we are having to express scalability concerns to our management. Concerns that will lead to a migration to alternate products if not addressed. As it sits right now there isn't even a glimmer of light in the tunnel!

            We are approaching 1 year since this gap was first identified...

            Who do we need to communicate with to influence these priority decisions?

            Ed Gibson
            Technical Support Manager
            University of Western Ontario

            Ed Gibson added a comment - Perhaps to extend the concerns expressed by David Foust. Can you provide some context as to how the prioritization of issues is deteremined. I noticed that this issue has been assigned 21 votes. Is the prioritization based linearily on the number of votes applied? When corporations like: University of Western Ontario MIT Bob Jones University Internet Broadcasting Systems NASA AUDI AG Feel passionate enough about this issue to not only vote but also take the time to express comments on the issue that we could catch some attention. Particularily when were indicating that we are having to express scalability concerns to our management. Concerns that will lead to a migration to alternate products if not addressed. As it sits right now there isn't even a glimmer of light in the tunnel! We are approaching 1 year since this gap was first identified... Who do we need to communicate with to influence these priority decisions? Ed Gibson Technical Support Manager University of Western Ontario

            Again, we are frustrated by lack of movement on this issue. Is there a way we could contract Atlassian Professional Services to perhaps develop this?

            David Foust added a comment - Again, we are frustrated by lack of movement on this issue. Is there a way we could contract Atlassian Professional Services to perhaps develop this?

            I needed to come back to this issue given a meeting today with a department looking to heavily utilize the wiki (we had to tell them that "no, you can't really use Confluence the way you want given scaling issues that only Atlassian can address").

            I can't really provide any new reasons beyond the ones mentioned above but would like to highlight a few....

            -prioritizing this issue based on number of votes will never show it's importance as only large shops will need it
            -those large shops won't be able to purchase cluster licenses (decent revenue $$) or continue increasing Confluence usage until this issue is resolved (due to scaling AND reliability issues)
            -looking at other applications that do support file stores on high-performance reliable NFS shares over TCP (e.g. NetApp – not your everyday NFS but at the same time just a rock-solid, high-performance implementation of the NFS v3 spec), I have a hard time believing that this is an insurmountable or even highly time-consuming architecture issue (please feel free to enlighten me on that with explanation or test results).

            If nothing else, simply making this available as an optional feature could allow those who are interested to provide testing feedback (YMMV of course – no guarantees from Atlassian).

            Thanks for considering this.

            Andrew Miller added a comment - I needed to come back to this issue given a meeting today with a department looking to heavily utilize the wiki (we had to tell them that "no, you can't really use Confluence the way you want given scaling issues that only Atlassian can address"). I can't really provide any new reasons beyond the ones mentioned above but would like to highlight a few.... -prioritizing this issue based on number of votes will never show it's importance as only large shops will need it -those large shops won't be able to purchase cluster licenses (decent revenue $$) or continue increasing Confluence usage until this issue is resolved (due to scaling AND reliability issues) -looking at other applications that do support file stores on high-performance reliable NFS shares over TCP (e.g. NetApp – not your everyday NFS but at the same time just a rock-solid, high-performance implementation of the NFS v3 spec), I have a hard time believing that this is an insurmountable or even highly time-consuming architecture issue (please feel free to enlighten me on that with explanation or test results). If nothing else, simply making this available as an optional feature could allow those who are interested to provide testing feedback (YMMV of course – no guarantees from Atlassian). Thanks for considering this.

            I urge Atlassian to reconsider and give this higher priority. Storage within the database is hurting our performance and costing more for attachment storage and database backup. Many PDM and CM systems that use similar architectures allow (multiple in most cases) NFS file stores with clustered application servers.

            John Russell added a comment - I urge Atlassian to reconsider and give this higher priority. Storage within the database is hurting our performance and costing more for attachment storage and database backup. Many PDM and CM systems that use similar architectures allow (multiple in most cases) NFS file stores with clustered application servers.

            That's the correct information at this stage. There is no scope for looking at this issue in the near term due to the amount of resource we'd have to allocate against other higher priorities that we have for the product. Please keep on voting for this issue as it is an important one that we understand is causing customers pain.

            Adnan Chowdhury [Atlassian] added a comment - That's the correct information at this stage. There is no scope for looking at this issue in the near term due to the amount of resource we'd have to allocate against other higher priorities that we have for the product. Please keep on voting for this issue as it is an important one that we understand is causing customers pain.

            Hopefully someone from Atlassian corrects me, but your answer is no. I extensively discussed this in a separate email chain with the Atlassian engineers, and they gave me this reply January 30th:

            "At this stage it is not on the roadmap, not to say the issue will not be addressed, simply that it won't be in the near term. It is a resource (priority) issue, as there is no technical limitation per se to implementing file system storage for Confluence Clustered. I'm sorry that this may hinder your roll out of a clustered environment."

            David Foust added a comment - Hopefully someone from Atlassian corrects me, but your answer is no. I extensively discussed this in a separate email chain with the Atlassian engineers, and they gave me this reply January 30th: "At this stage it is not on the roadmap, not to say the issue will not be addressed, simply that it won't be in the near term. It is a resource (priority) issue, as there is no technical limitation per se to implementing file system storage for Confluence Clustered. I'm sorry that this may hinder your roll out of a clustered environment."

            Hello,

            arent there any plans to implement this urgent feature in a medium-rage perspective?
            For productive usage of Confluence we need to store the attachments on a clustered NFS share.

            Team Engineering Portal
            AUDI AG

            Support Audi Engineering Network added a comment - Hello, arent there any plans to implement this urgent feature in a medium-rage perspective? For productive usage of Confluence we need to store the attachments on a clustered NFS share. Team Engineering Portal AUDI AG

            Ed Gibson added a comment -

            In response to Matt's comments, I unfortunately don't have the personal knowledge to assist on the java library issues but have some SME's here that I can consult. I must admit though I'm amazed this issue has not popped up on other applications that I'm sure have java underpinning's and utilize the use of nfs to get to data.

            That being said I'd still like to encourage the allocation of resources to address this implementation gap.

            Scaling this product as time progresses is a critical component to it's sustainability from my perspective.

            Data management will always be an issue in large implementations. Requiring the deposit of attachments within the database just presents too many challenges. Isn't the secret here to manage the linkages out of the Meta database? Keeping the bulky attachments in a feature filled filesystem like ZFS is the only way we can manage that bulk as time progresses.

            Backup for example has limits, unless we can get closer to efficiently managing the frequency at which we access data. Decreasing the number of "snaps" as it becomes static gains us back time. One resource we are always fighting for! Unchanging data (like bulky, aging attachments) should be data classified into a sorting structure to differentiate it from the dynamic content. Intelligent filesystems bring this capability into reach. As long as it's presented transparently (via features within nfs/zfs) to your app this data classification issue can be managed distinctly from your app.

            I guess all I'm trying to do is highlight what I feel is a foundational requirement if you want your product to really scale.

            The validation component for me has been the number of other university IT environments that have stepped up and also expressed there concern.

            Okay I'll get off the soap box now...

            Hope we can find a resolution to his one!

            Ed Gibson
            Team Leader - Network Operations/*nix Administration
            University of Western Ontario

            Ed Gibson added a comment - In response to Matt's comments, I unfortunately don't have the personal knowledge to assist on the java library issues but have some SME's here that I can consult. I must admit though I'm amazed this issue has not popped up on other applications that I'm sure have java underpinning's and utilize the use of nfs to get to data. That being said I'd still like to encourage the allocation of resources to address this implementation gap. Scaling this product as time progresses is a critical component to it's sustainability from my perspective. Data management will always be an issue in large implementations. Requiring the deposit of attachments within the database just presents too many challenges. Isn't the secret here to manage the linkages out of the Meta database? Keeping the bulky attachments in a feature filled filesystem like ZFS is the only way we can manage that bulk as time progresses. Backup for example has limits, unless we can get closer to efficiently managing the frequency at which we access data. Decreasing the number of "snaps" as it becomes static gains us back time. One resource we are always fighting for! Unchanging data (like bulky, aging attachments) should be data classified into a sorting structure to differentiate it from the dynamic content. Intelligent filesystems bring this capability into reach. As long as it's presented transparently (via features within nfs/zfs) to your app this data classification issue can be managed distinctly from your app. I guess all I'm trying to do is highlight what I feel is a foundational requirement if you want your product to really scale. The validation component for me has been the number of other university IT environments that have stepped up and also expressed there concern. Okay I'll get off the soap box now... Hope we can find a resolution to his one! Ed Gibson Team Leader - Network Operations/*nix Administration University of Western Ontario

            Dave Ross added a comment -

            This is in important feature for us as well - pretty much a blocker between us and going beyond a single confluence node.

            In my experience, there aren't any issues with using standard java IO libraries to read and write files to a networked filesystem. It's the OS's responsibility to manage concurrent access. We run a clustered java-based collaboration tool on windows and it has no problem using a NTFS share for storage of binary content. There are many other large schools who run this system (uMich has something like 12 app server nodes) and all of them use networked storage for binary content (they use a AFS but are moving to NetApp I believe - Indiana is on NetApp). File reading and writing is done using java.io streams. See the following documentation for the application I'm talking about...

            http://confluence.sakaiproject.org/confluence/display/DOC/Sakai+2.4+Admin+Guide+-+Load+Balancing+and+Scaling
            http://confluence.sakaiproject.org/confluence/display/ENC/Configuring+Content+Hosting

            Dave Ross added a comment - This is in important feature for us as well - pretty much a blocker between us and going beyond a single confluence node. In my experience, there aren't any issues with using standard java IO libraries to read and write files to a networked filesystem. It's the OS's responsibility to manage concurrent access. We run a clustered java-based collaboration tool on windows and it has no problem using a NTFS share for storage of binary content. There are many other large schools who run this system (uMich has something like 12 app server nodes) and all of them use networked storage for binary content (they use a AFS but are moving to NetApp I believe - Indiana is on NetApp). File reading and writing is done using java.io streams. See the following documentation for the application I'm talking about... http://confluence.sakaiproject.org/confluence/display/DOC/Sakai+2.4+Admin+Guide+-+Load+Balancing+and+Scaling http://confluence.sakaiproject.org/confluence/display/ENC/Configuring+Content+Hosting

            Is this issue a priority for Atlassian? I figure there will not be a massive crowd of folks wanting this, so prioritizing based on votes will never let this issue see the light of day.

            As for us, I can't get approval to purchase cluster licenses until the attachments are back on the filesystem. There will just be too much data for the database to deal with, needlessly. Also, folks are hesitating about really adopting the wiki for their work, due to us running all eggs in the 1 node basket.

            So if it takes a monetary impact assessment, that is at least $8,000 + $4,000 a year after that you are missing.

            Not trying to be a winey customer, as I am very happy with confluence, but this is something we really need.

            David Foust added a comment - Is this issue a priority for Atlassian? I figure there will not be a massive crowd of folks wanting this, so prioritizing based on votes will never let this issue see the light of day. As for us, I can't get approval to purchase cluster licenses until the attachments are back on the filesystem. There will just be too much data for the database to deal with, needlessly. Also, folks are hesitating about really adopting the wiki for their work, due to us running all eggs in the 1 node basket. So if it takes a monetary impact assessment, that is at least $8,000 + $4,000 a year after that you are missing. Not trying to be a winey customer, as I am very happy with confluence, but this is something we really need.

            Not that this comment contributes to the discussion, but out attachments are about 20 times larger than our database. I don't understand why file-locking is an issue. It seems that you would handle this just like editing a document in the database. Each attachment should have a record that indicates version, activity, etc. that would handle it. I am going to have a tough time selling this when I say that our 10GB database is going to be 200GB.

            David Foust added a comment - Not that this comment contributes to the discussion, but out attachments are about 20 times larger than our database. I don't understand why file-locking is an issue. It seems that you would handle this just like editing a document in the database. Each attachment should have a record that indicates version, activity, etc. that would handle it. I am going to have a tough time selling this when I say that our 10GB database is going to be 200GB.

            This is also a critical issue for us. Our preference would be to store attachments on a NFS share from a NetApp.

            Nate Carlson added a comment - This is also a critical issue for us. Our preference would be to store attachments on a NFS share from a NetApp.

            Given that both Java & NFS came out of Sun, I'm guessing it can't be too uncommon. A quick Google search did turn this up....would have the benefit of being OS agnostic. (rather old but made sense with a quick readthrough)

            http://jdj.sys-con.com/read/36122.htm

            Andrew Miller added a comment - Given that both Java & NFS came out of Sun, I'm guessing it can't be too uncommon. A quick Google search did turn this up....would have the benefit of being OS agnostic. (rather old but made sense with a quick readthrough) http://jdj.sys-con.com/read/36122.htm

            Matt Ryall added a comment -

            As far as we know, Java itself doesn't support the type of tight filesystem integration that would be required for consistent data access by a Confluence cluster on NFS. We'd have to involve some kind of third-party library specifically for interfacing with the networked filesystem. Using NFS simply as a normal filesystem in Java (with java.io.File and friends) doesn't actually do the kind of locking required.

            We have also seen problems using NFS even for the attachment storage of a single Confluence node, without clustered access. Where there are consistency problems with attachments in support cases, we recommend avoiding NFS and other NAS solutions.

            In summary, using the database is the only cross-platform solution that we can see working for us in the short term. Development of NFS support would indeed be a useful feature, but it is a significant new development, not a simple enhancement or patch.

            Please continue to vote for this feature so we can prioritise it along with our other work, and post any relevant feedback.

            Matt Ryall added a comment - As far as we know, Java itself doesn't support the type of tight filesystem integration that would be required for consistent data access by a Confluence cluster on NFS. We'd have to involve some kind of third-party library specifically for interfacing with the networked filesystem. Using NFS simply as a normal filesystem in Java (with java.io.File and friends) doesn't actually do the kind of locking required. We have also seen problems using NFS even for the attachment storage of a single Confluence node, without clustered access. Where there are consistency problems with attachments in support cases, we recommend avoiding NFS and other NAS solutions. In summary, using the database is the only cross-platform solution that we can see working for us in the short term. Development of NFS support would indeed be a useful feature, but it is a significant new development, not a simple enhancement or patch. Please continue to vote for this feature so we can prioritise it along with our other work, and post any relevant feedback.

            Ed Gibson added a comment -

            To echo Andrews sentiment and address further Charles's comment.

            My response would be that NFS (Network File Share) will provide the desired locking functionality if multiple nodes are attempting to alter the same file. The scenario you present is no different then two separate users, on two distinct linux hosts, attempting to change a single file shared between those two hosts via nfs. An underpinning function of nfs is to provide the appropriate locking mechanics that avoid file access contention.

            Perhaps I'm missing something here?

            Ed Gibson added a comment - To echo Andrews sentiment and address further Charles's comment. My response would be that NFS (Network File Share) will provide the desired locking functionality if multiple nodes are attempting to alter the same file. The scenario you present is no different then two separate users, on two distinct linux hosts, attempting to change a single file shared between those two hosts via nfs. An underpinning function of nfs is to provide the appropriate locking mechanics that avoid file access contention. Perhaps I'm missing something here?

            Would NFS v4 file locking help here?

            While I don't know that this helps with SUN QFS/SAM-FS scenario, we'd be doing NFS off a high-performance NFS server (e.g. NetApp) where NFS v4 would be a given.

            Currently for us, our attachments directory is 6 times the size of our database so that would be a pretty large impact on our database dumps.

            Andrew Miller added a comment - Would NFS v4 file locking help here? While I don't know that this helps with SUN QFS/SAM-FS scenario, we'd be doing NFS off a high-performance NFS server (e.g. NetApp) where NFS v4 would be a given. Currently for us, our attachments directory is 6 times the size of our database so that would be a pretty large impact on our database dumps.

            While a good idea, the implementation is slightly more complicated than just re-enabling the filesystem attachment storage code in a cluster.

            In a single-node environment, we entrust data consistency to the fact that only one process is modifying attachments on the filesystem. In a multi-node environment we can entrust consistency to the database (the C in ACID). If we got rid of the database, we'd have to start worrying about manually maintaining filesystem consistency between different nodes.

            Charles Miller (Inactive) added a comment - While a good idea, the implementation is slightly more complicated than just re-enabling the filesystem attachment storage code in a cluster. In a single-node environment, we entrust data consistency to the fact that only one process is modifying attachments on the filesystem. In a multi-node environment we can entrust consistency to the database (the C in ACID). If we got rid of the database, we'd have to start worrying about manually maintaining filesystem consistency between different nodes.

            The option to keep attachments on the filesystem in a clustered environment is important to us at MIT as well.

            Carter Snowden added a comment - The option to keep attachments on the filesystem in a clustered environment is important to us at MIT as well.

            Ed Gibson added a comment -

            The vision behind this request would be to utilize SUN QFS
            http://www.sun.com/storagetek/management_software/data_management/qfs/
            network drives with SUN SAM-FS
            http://www.sun.com/storagetek/management_software/data_management/sam/
            for attachments.
            This would provide an efficient scaleable and redundant Confluence environment. on both the frontend (cluster) and backend (disk). The clustering function provides the desired horizontal growth on the frontend. However including attachments within the database presents scaling challenges. By seperating the attachments from the database we can utilize shared network drives and the SAM-FS archiving functions to push low activity attachments to cheaper tier 2 disk. This functionality appears to exist in the standalone version (i.e. separation of attachments from the database) but appears to disappear in a cluster environment.

            Ed Gibson
            University of Western Ontario
            egibson@uwo.ca

            Ed Gibson added a comment - The vision behind this request would be to utilize SUN QFS http://www.sun.com/storagetek/management_software/data_management/qfs/ network drives with SUN SAM-FS http://www.sun.com/storagetek/management_software/data_management/sam/ for attachments. This would provide an efficient scaleable and redundant Confluence environment. on both the frontend (cluster) and backend (disk). The clustering function provides the desired horizontal growth on the frontend. However including attachments within the database presents scaling challenges. By seperating the attachments from the database we can utilize shared network drives and the SAM-FS archiving functions to push low activity attachments to cheaper tier 2 disk. This functionality appears to exist in the standalone version (i.e. separation of attachments from the database) but appears to disappear in a cluster environment. Ed Gibson University of Western Ontario egibson@uwo.ca

              onevalainen Olli Nevalainen
              jlargman Jeremy Largman
              Votes:
              73 Vote for this issue
              Watchers:
              60 Start watching this issue

                Created:
                Updated:
                Resolved: