|
An alternate and DBMS agnostic solution would be for confluence to store a derived column itself, index the field, and ensure all uses of LOWER use the derived fields.
I agree with Patrick's solution. If we need to look up the title in a case-insensitive fashion and Hibernate doesn't support a generic way of creating such an index on all our supported databases, we should do the to-lower-casing ourselves and store it in a separate column. We would have to watch out for Locales though. Whatever Locale we use to store the lowercase version needs to stay the same forever, or we need to update the whole table when it changes.
If this problem applies to DB2 it should also apply to PostgreSQL and Oracle. MySQL can be run with a case sensitive or insensitive collation. I'm hoping we can just not do a case-insensitive search when looking up page titles. Any search that generic should probably using Lucene index rather than the Database. This is causing major problems for us at wikis.sun.com where we use MySQL5.
There are other reports of this issue occurring with MySQL: MySQL RFE: http://bugs.mysql.com/bug.php?id=9639 The title of this JIRA should be changed so that it's clear that this is a generic issue. PostgresSQL can create indexes for functions, but since Confluence doesn't generate the index, users don't benefit from this unless a DBA creates the index manually. Drupal was facing the very same issues in the past (http://drupal.org/node/83738 Hi Ivan,
I'll attach an updated version of the patch. We just experienced major outage today that lasted the entire afternoon. According to the db investigation this was due to slow queries that were using the lower() function. After the patch was deployed, the outage cleared. I suggest that you consider this to be a blocker RFE/bug as it can take a site down easily. There is one more major bug that we found in atlassian-user code, I'll file a separate issue for that. cheers, And just to give you some numbers. I ran some simple benchmarks with and without the patch on our production db during our outage and this is the difference in time it takes to execute a query that retrieves a wiki page:
before: 10-15 seconds WHAT A DIFFERENCE! My only concern is if the patch is cluster safe. We were running two nodes side by side, but since I don't understand all your clustering code (yet), I took one node down just to be safe. Can someone from Atlassian have a look at it and let us know if it's ok to run this in clustered mode? cheers, Hi Igor,
I've discussed this issue with one of the developers, familiar with the code, and although he agrees that the performance improvement is enormous it is perhaps approached too specifically to the MySQL 5 database and not generic enough (not sure, perhaps there is even more effective way to implement it). We need to investigate and test the required fix on all supported databases against our functional and productions/test loading tests to be able to ascertain it's applicability and ability to be embedded into the main source code. As you would appreciate it takes a bit of time to investigate, and thus we shall provide an update on this ticket when necessary information is collated. Thanks for your effort mate ! Great work and perhaps I'd suggest to open a support ticket for the database problems so we can review the logs etc... to ensure there is nothing else going on additionally. s pozdravom ostava Ivan Hi Ivan,
I'm afraid that you are wrong. The patch is as database agnostic as possible and will work with any database out there. I avoided the use of triggers, stored procedures or any other database specific technologies that would by compatible with other relational database systems. I even went to the trouble of using some Atlassian's utility methods to make the patch compatible with DB2. Any database that supports indexes (that is any serious db out there) will enormously benefit from this patch. cheers, Hi Igor,
As I've expressed earlier in this comments, I think the approach taken in your patch is probably the right one. What Ivan was trying to express is that this patch is only required for some databases, whereas others will happily use their existing index despite the lower function. The patch is required for MySQL because it ignores indexes as soon as you introduce a function. The introduction of the column is definitely the most database-generic way of doing it. I have looked quickly over your patch and cannot imagine a reason it would not work in a cluster. How did you migrate the data? By SQL? I am not sure when we will have time to test your patch, but I certainly hope we can do it soon. MySQL will by default completely ignore case on data anyway. Certainly you could do it for this specific table, and in fact some of our customers run their whole mysql database with case insensitivity. We recommend against this because the majority of databases, and therefore of our testing, is done case sensitively. I suggest setting your content table's collation to be case insensitive, and simply taking out the call to the lower function in the queries that use it. cf http://dev.mysql.com/doc/refman/5.0/en/charset-table.html Of course, there are pitfalls to doing this, such as the chance of a unique key constraint collision where two keys differ only in case. Cheers, Hello,
I want to reiterate 1. DB2 9 for z/OS supports index on column expressions. 2. DB2 LUW supports generated columns with index on generated columns - where the database maintains the generated column and will automatically rewrite queries to use the generated column. In DB2 LUW 9.5, generated columns can be created as hidden columns (not returned for SELECT *). So this problem is solvable in current releases of DB2 - but it requires user action to solve it. This issue is a major problem for any site that has large tables. Basically, any customer that's going to have a large wiki installation - this issue is a show-stopper. The lower is used all over the place. Not just on content. The biggest issue is on the SPACES table - since it's used to look up content, blog posts, etc. So it's either a unique lookup on SPACES, or a table space scan to find 1 row. We also saw this issue on LINKS. BLOGPOST Title also has the issue. select blogpost0_.CONTENTID as CONTENTID, blogpost0_.SPACEID as SPACEID CONTENT_LABEL table. ATTACHMENTS table. Our wiki has a lot of content, a lot of attachements, a lot of spaces. Any big site - this issue is an absolute performance killer . The wiki quickly becomes useless. Don,
We took the risk and we deployed the patch to production yesterday. So far so good The data was manually migrated using SQL, the queries should be in the comment section of the patch. Thanks for the cluster update. I thought that just a simple serialization was used to push the objects over the wire to the second node, but I wasn't sure. I'd prefer the solution proposed in the patch over changing the hibernate queries and exploiting MySQL insensitivity because the solution is the smallest common denominator for all of your supported databases, which would make it easier for you to test and support it. Using this solution you could easily and safely provide this enhancements to all of your customers regardless of their database via an upgrade task (I can image that creating an upgrade task that would create custom indexes for all the supported databases would be much trickier). I completely agree that the use of lower in all these places is going to hurt performance for some databases and that Igor's approach is the most pragmatic one for us to take in fixing this. (This is despite the obvious ickiness of replicated data). I only suggested using the MySQL insensitivity because if we don't manage to prioritise this bug, then you're going to need to maintain whatever workaround you do, and I'd think that a simple workaround would be easier to maintain than one that is database agnostic.
Thanks Don. I'm not happy about the data redundancy, but if it means that you have only one db schema to support that is fast for everyone, then I think it's definitely worth it.
The benefit for us is that you'll do all your QA with the same schema as we use, minimizing the chance of something breaking for us in the future because of this change. /i |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
DB2 LUW does support indexes on column expressions through generated columns.
http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp?topic=/com.ibm.db2.luw.admin.dbobj.doc/doc/c0020109.html