Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-10030

db2: queries that use 'lower' do not use index because of case sensitivity

    • 5
    • We collect Confluence feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

      NOTE: This suggestion is for Confluence Server. Using Confluence Cloud? See the corresponding suggestion.

      Most of the queries against CONTENT table use 'lower' db function for a title field. The goes for (space) key field in SPACES table.
      We do it to make a case insensitive comparison, however it means that db2 does not use the existing indexes for those fields and performs a full table scan.
      This results in a poor performance.

      This can be applicable to other database too, but it is not verified. Also other databases may have global 'case sensitivity' flag.

        1. lower-func-performance.patch
          34 kB
        2. lower-func-performance.patch
          28 kB
        3. lower-func-performance.patch
          18 kB
        4. lower-func-performance.patch
          19 kB
        5. lower-func-performance.patch
          3 kB

            [CONFSERVER-10030] db2: queries that use 'lower' do not use index because of case sensitivity

            Matt Ryall added a comment -

            As per our announcement, the last major version of Confluence to support DB2 was Confluence 4.2. Confluence 4.3 and later will not support DB2.

            Therefore this issue will not be fixed. See my comment above for a workaround in existing instances.

            Matt Ryall added a comment - As per our announcement , the last major version of Confluence to support DB2 was Confluence 4.2. Confluence 4.3 and later will not support DB2. Therefore this issue will not be fixed. See my comment above for a workaround in existing instances.

            Adding a generated column and index manually can resolve this problem on DB2.

            See the instructions on this page, under "DB2 for Linux or Windows": Creating a Lowercase Page Title Index.

            Matt Ryall added a comment - Adding a generated column and index manually can resolve this problem on DB2. See the instructions on this page, under "DB2 for Linux or Windows": Creating a Lowercase Page Title Index .

            Igor Minar added a comment -

            Adding the latest version of the patch with removed lower() function calls. This one removes the calls also from all the Java classes (these were giving us a hard time in production as well).

            Igor Minar added a comment - Adding the latest version of the patch with removed lower() function calls. This one removes the calls also from all the Java classes (these were giving us a hard time in production as well).

            adding a patch against confluence 3.2 that removes all lower() calls and relies only on mysql case-insensitive collation

            Igor Minar added a comment - adding a patch against confluence 3.2 that removes all lower() calls and relies only on mysql case-insensitive collation

            Matt Ryall added a comment -

            Confluence 3.0 includes some improved caching that alleviates this problem a lot for page lookups. We've also documented the instructions for creating lowercase indexes manually in several databases that support it (including DB2, thanks for the help – Patrick):

            http://confluence.atlassian.com/display/DOC/Creating+a+Lowercase+Page+Title+Index

            This issue will remain open, but should be much less of a problem if you can follow the instructions on that page.

            Matt Ryall added a comment - Confluence 3.0 includes some improved caching that alleviates this problem a lot for page lookups. We've also documented the instructions for creating lowercase indexes manually in several databases that support it (including DB2, thanks for the help – Patrick): http://confluence.atlassian.com/display/DOC/Creating+a+Lowercase+Page+Title+Index This issue will remain open, but should be much less of a problem if you can follow the instructions on that page.

            Igor Minar added a comment -

            rebased version compatible with confluence 2.10.1

            Igor Minar added a comment - rebased version compatible with confluence 2.10.1

            Igor Minar added a comment -

            Thanks Don. I'm not happy about the data redundancy, but if it means that you have only one db schema to support that is fast for everyone, then I think it's definitely worth it.

            The benefit for us is that you'll do all your QA with the same schema as we use, minimizing the chance of something breaking for us in the future because of this change.

            /i

            Igor Minar added a comment - Thanks Don. I'm not happy about the data redundancy, but if it means that you have only one db schema to support that is fast for everyone, then I think it's definitely worth it. The benefit for us is that you'll do all your QA with the same schema as we use, minimizing the chance of something breaking for us in the future because of this change. /i

            Don Willis added a comment -

            I completely agree that the use of lower in all these places is going to hurt performance for some databases and that Igor's approach is the most pragmatic one for us to take in fixing this. (This is despite the obvious ickiness of replicated data). I only suggested using the MySQL insensitivity because if we don't manage to prioritise this bug, then you're going to need to maintain whatever workaround you do, and I'd think that a simple workaround would be easier to maintain than one that is database agnostic.

            Don Willis added a comment - I completely agree that the use of lower in all these places is going to hurt performance for some databases and that Igor's approach is the most pragmatic one for us to take in fixing this. (This is despite the obvious ickiness of replicated data). I only suggested using the MySQL insensitivity because if we don't manage to prioritise this bug, then you're going to need to maintain whatever workaround you do, and I'd think that a simple workaround would be easier to maintain than one that is database agnostic.

            Igor Minar added a comment -

            Don,

            We took the risk and we deployed the patch to production yesterday. So far so good

            The data was manually migrated using SQL, the queries should be in the comment section of the patch.

            Thanks for the cluster update. I thought that just a simple serialization was used to push the objects over the wire to the second node, but I wasn't sure.

            I'd prefer the solution proposed in the patch over changing the hibernate queries and exploiting MySQL insensitivity because the solution is the smallest common denominator for all of your supported databases, which would make it easier for you to test and support it.

            Using this solution you could easily and safely provide this enhancements to all of your customers regardless of their database via an upgrade task (I can image that creating an upgrade task that would create custom indexes for all the supported databases would be much trickier).

            Igor Minar added a comment - Don, We took the risk and we deployed the patch to production yesterday. So far so good The data was manually migrated using SQL, the queries should be in the comment section of the patch. Thanks for the cluster update. I thought that just a simple serialization was used to push the objects over the wire to the second node, but I wasn't sure. I'd prefer the solution proposed in the patch over changing the hibernate queries and exploiting MySQL insensitivity because the solution is the smallest common denominator for all of your supported databases, which would make it easier for you to test and support it. Using this solution you could easily and safely provide this enhancements to all of your customers regardless of their database via an upgrade task (I can image that creating an upgrade task that would create custom indexes for all the supported databases would be much trickier).

            Hello,
            I want to reiterate
            1. DB2 9 for z/OS supports index on column expressions.
            2. DB2 LUW supports generated columns with index on generated columns - where the database maintains the generated column and will automatically rewrite queries to use the generated column. In DB2 LUW 9.5, generated columns can be created as hidden columns (not returned for SELECT *).

            So this problem is solvable in current releases of DB2 - but it requires user action to solve it.

            This issue is a major problem for any site that has large tables. Basically, any customer that's going to have a large wiki installation - this issue is a show-stopper. The lower is used all over the place. Not just on content. The biggest issue is on the SPACES table - since it's used to look up content, blog posts, etc. So it's either a unique lookup on SPACES, or a table space scan to find 1 row.

            We also saw this issue on LINKS.
            select outgoingli0_.LINKID as LINKID, outgoingli0_.DESTPAGETITLE as DESTPAGE2_
            , outgoingli0_.DESTSPACEKEY as DESTSPAC3_, outgoingli0_.CONTENTID as CONTENTID
            , outgoingli0_.CREATOR as CREATOR, outgoingli0_.CREATIONDATE as CREATION6_
            , outgoingli0_.LASTMODIFIER as LASTMODI7_, outgoingli0_.LASTMODDATE as LASTMODD8_
            from LINKS outgoingli0_ CARD 1457006 NPAGES 57402
            where (lower(outgoingli0_.DESTSPACEKEY)=? ) COLCARD 11320
            and(lower(outgoingli0_.DESTPAGETITLE)=? ) COLCARD 356000

            BLOGPOST Title also has the issue.

            select blogpost0_.CONTENTID as CONTENTID, blogpost0_.SPACEID as SPACEID
            , blogpost0_.TITLE as TITLE, blogpost0_.VERSION as VERSION
            , blogpost0_.CREATOR as CREATOR, blogpost0_.CREATIONDATE as CREATION6_
            , blogpost0_.LASTMODIFIER as LASTMODI7_, blogpost0_.LASTMODDATE as LASTMODD8_
            , blogpost0_.VERSIONCOMMENT as VERSIONC9_, blogpost0_.PREVVER as PREVVER
            , blogpost0_.CONTENT_STATUS as CONTENT11_
            from CONTENT blogpost0_
            , SPACES space1_
            where blogpost0_.CONTENTTYPE='BLOGPOST'
            and ((lower(space1_.SPACEKEY)=?
            and blogpost0_.SPACEID=space1_.SPACEID)
            and(lower(blogpost0_.TITLE)=? )
            and(blogpost0_.CREATIONDATE>=? )
            and(blogpost0_.CREATIONDATE<? )
            and(blogpost0_.PREVVER is null )
            and(blogpost0_.CONTENT_STATUS='current' ))

            CONTENT_LABEL table.
            select distinct label0_.LABELID as LABELID
            , label0_.NAME as NAME, label0_.OWNER as OWNER
            , label0_.NAMESPACE as NAMESPACE, label0_.CREATIONDATE as CREATION5_
            , label0_.LASTMODDATE as LASTMODD6_
            from LABEL label0_ CARD 22253 NPAGES 424
            , CONTENT_LABEL labelling1_ CARD 73483 NPAGES 1711
            where (lower(labelling1_.SPACEKEY)=? ) COLCARD 7813
            AND(labelling1_.LABELID=label0_.LABELID ) COLCARD 22253/22253
            AND(label0_.NAMESPACE=? ) COLCARD 4
            order by label0_.NAME

            ATTACHMENTS table.
            SELECT attachmentid, title, contenttype, creator, creationdate
            , lastmodifier, lastmoddate, filesize, attachment_comment
            , attversion, prevver
            FROM WIKI_RW.attachments CARD 403449
            where pageid = ? COLCARD 94288
            and lower(title) = ? COLCARD 253952
            order by attversion desc with UR

            Our wiki has a lot of content, a lot of attachements, a lot of spaces. Any big site - this issue is an absolute performance killer . The wiki quickly becomes useless.

            Deleted Account (Inactive) added a comment - Hello, I want to reiterate 1. DB2 9 for z/OS supports index on column expressions. 2. DB2 LUW supports generated columns with index on generated columns - where the database maintains the generated column and will automatically rewrite queries to use the generated column. In DB2 LUW 9.5, generated columns can be created as hidden columns (not returned for SELECT *). So this problem is solvable in current releases of DB2 - but it requires user action to solve it. This issue is a major problem for any site that has large tables. Basically, any customer that's going to have a large wiki installation - this issue is a show-stopper. The lower is used all over the place. Not just on content. The biggest issue is on the SPACES table - since it's used to look up content, blog posts, etc. So it's either a unique lookup on SPACES, or a table space scan to find 1 row. We also saw this issue on LINKS. select outgoingli0_.LINKID as LINKID, outgoingli0_.DESTPAGETITLE as DESTPAGE2_ , outgoingli0_.DESTSPACEKEY as DESTSPAC3_, outgoingli0_.CONTENTID as CONTENTID , outgoingli0_.CREATOR as CREATOR, outgoingli0_.CREATIONDATE as CREATION6_ , outgoingli0_.LASTMODIFIER as LASTMODI7_, outgoingli0_.LASTMODDATE as LASTMODD8_ from LINKS outgoingli0_ CARD 1457006 NPAGES 57402 where (lower(outgoingli0_.DESTSPACEKEY)=? ) COLCARD 11320 and(lower(outgoingli0_.DESTPAGETITLE)=? ) COLCARD 356000 BLOGPOST Title also has the issue. select blogpost0_.CONTENTID as CONTENTID, blogpost0_.SPACEID as SPACEID , blogpost0_.TITLE as TITLE, blogpost0_.VERSION as VERSION , blogpost0_.CREATOR as CREATOR, blogpost0_.CREATIONDATE as CREATION6_ , blogpost0_.LASTMODIFIER as LASTMODI7_, blogpost0_.LASTMODDATE as LASTMODD8_ , blogpost0_.VERSIONCOMMENT as VERSIONC9_, blogpost0_.PREVVER as PREVVER , blogpost0_.CONTENT_STATUS as CONTENT11_ from CONTENT blogpost0_ , SPACES space1_ where blogpost0_.CONTENTTYPE='BLOGPOST' and ((lower(space1_.SPACEKEY)=? and blogpost0_.SPACEID=space1_.SPACEID) and(lower(blogpost0_.TITLE)=? ) and(blogpost0_.CREATIONDATE>=? ) and(blogpost0_.CREATIONDATE<? ) and(blogpost0_.PREVVER is null ) and(blogpost0_.CONTENT_STATUS='current' )) CONTENT_LABEL table. select distinct label0_.LABELID as LABELID , label0_.NAME as NAME, label0_.OWNER as OWNER , label0_.NAMESPACE as NAMESPACE, label0_.CREATIONDATE as CREATION5_ , label0_.LASTMODDATE as LASTMODD6_ from LABEL label0_ CARD 22253 NPAGES 424 , CONTENT_LABEL labelling1_ CARD 73483 NPAGES 1711 where (lower(labelling1_.SPACEKEY)=? ) COLCARD 7813 AND(labelling1_.LABELID=label0_.LABELID ) COLCARD 22253/22253 AND(label0_.NAMESPACE=? ) COLCARD 4 order by label0_.NAME ATTACHMENTS table. SELECT attachmentid, title, contenttype, creator, creationdate , lastmodifier, lastmoddate, filesize, attachment_comment , attversion, prevver FROM WIKI_RW.attachments CARD 403449 where pageid = ? COLCARD 94288 and lower(title) = ? COLCARD 253952 order by attversion desc with UR Our wiki has a lot of content, a lot of attachements, a lot of spaces. Any big site - this issue is an absolute performance killer . The wiki quickly becomes useless.

            Don Willis added a comment -

            Hi Igor,

            As I've expressed earlier in this comments, I think the approach taken in your patch is probably the right one.

            What Ivan was trying to express is that this patch is only required for some databases, whereas others will happily use their existing index despite the lower function. The patch is required for MySQL because it ignores indexes as soon as you introduce a function. The introduction of the column is definitely the most database-generic way of doing it.

            I have looked quickly over your patch and cannot imagine a reason it would not work in a cluster. How did you migrate the data? By SQL? I am not sure when we will have time to test your patch, but I certainly hope we can do it soon.
            However, in case we do not get to working your patch into the product soon, might I suggest a simpler patch for your particular instance?

            MySQL will by default completely ignore case on data anyway. Certainly you could do it for this specific table, and in fact some of our customers run their whole mysql database with case insensitivity. We recommend against this because the majority of databases, and therefore of our testing, is done case sensitively. I suggest setting your content table's collation to be case insensitive, and simply taking out the call to the lower function in the queries that use it. cf http://dev.mysql.com/doc/refman/5.0/en/charset-table.html. That change will be smaller, and easier to maintain than the database agnostic one.

            Of course, there are pitfalls to doing this, such as the chance of a unique key constraint collision where two keys differ only in case.

            Cheers,
            Don

            Don Willis added a comment - Hi Igor, As I've expressed earlier in this comments, I think the approach taken in your patch is probably the right one. What Ivan was trying to express is that this patch is only required for some databases, whereas others will happily use their existing index despite the lower function. The patch is required for MySQL because it ignores indexes as soon as you introduce a function. The introduction of the column is definitely the most database-generic way of doing it. I have looked quickly over your patch and cannot imagine a reason it would not work in a cluster. How did you migrate the data? By SQL? I am not sure when we will have time to test your patch, but I certainly hope we can do it soon. However, in case we do not get to working your patch into the product soon, might I suggest a simpler patch for your particular instance? MySQL will by default completely ignore case on data anyway. Certainly you could do it for this specific table, and in fact some of our customers run their whole mysql database with case insensitivity. We recommend against this because the majority of databases, and therefore of our testing, is done case sensitively. I suggest setting your content table's collation to be case insensitive, and simply taking out the call to the lower function in the queries that use it. cf http://dev.mysql.com/doc/refman/5.0/en/charset-table.html . That change will be smaller, and easier to maintain than the database agnostic one. Of course, there are pitfalls to doing this, such as the chance of a unique key constraint collision where two keys differ only in case. Cheers, Don

            Igor Minar added a comment -

            Hi Ivan,

            I'm afraid that you are wrong.

            The patch is as database agnostic as possible and will work with any database out there. I avoided the use of triggers, stored procedures or any other database specific technologies that would by compatible with other relational database systems. I even went to the trouble of using some Atlassian's utility methods to make the patch compatible with DB2.

            Any database that supports indexes (that is any serious db out there) will enormously benefit from this patch.

            cheers,
            Igor

            Igor Minar added a comment - Hi Ivan, I'm afraid that you are wrong. The patch is as database agnostic as possible and will work with any database out there. I avoided the use of triggers, stored procedures or any other database specific technologies that would by compatible with other relational database systems. I even went to the trouble of using some Atlassian's utility methods to make the patch compatible with DB2. Any database that supports indexes (that is any serious db out there) will enormously benefit from this patch. cheers, Igor

            Hi Igor,
            I've discussed this issue with one of the developers, familiar with the code, and although he agrees that the performance improvement is enormous it is perhaps approached too specifically to the MySQL 5 database and not generic enough (not sure, perhaps there is even more effective way to implement it). We need to investigate and test the required fix on all supported databases against our functional and productions/test loading tests to be able to ascertain it's applicability and ability to be embedded into the main source code. As you would appreciate it takes a bit of time to investigate, and thus we shall provide an update on this ticket when necessary information is collated. Thanks for your effort mate ! Great work and perhaps I'd suggest to open a support ticket for the database problems so we can review the logs etc... to ensure there is nothing else going on additionally.
            s pozdravom ostava
            Ivan

            Ivan Benko [Atlassian] added a comment - Hi Igor, I've discussed this issue with one of the developers, familiar with the code, and although he agrees that the performance improvement is enormous it is perhaps approached too specifically to the MySQL 5 database and not generic enough (not sure, perhaps there is even more effective way to implement it). We need to investigate and test the required fix on all supported databases against our functional and productions/test loading tests to be able to ascertain it's applicability and ability to be embedded into the main source code. As you would appreciate it takes a bit of time to investigate, and thus we shall provide an update on this ticket when necessary information is collated. Thanks for your effort mate ! Great work and perhaps I'd suggest to open a support ticket for the database problems so we can review the logs etc... to ensure there is nothing else going on additionally. s pozdravom ostava Ivan

            Igor Minar added a comment -

            And just to give you some numbers. I ran some simple benchmarks with and without the patch on our production db during our outage and this is the difference in time it takes to execute a query that retrieves a wiki page:

            before: 10-15 seconds
            after: 0.01-0.12 seconds

            WHAT A DIFFERENCE!

            My only concern is if the patch is cluster safe. We were running two nodes side by side, but since I don't understand all your clustering code (yet), I took one node down just to be safe. Can someone from Atlassian have a look at it and let us know if it's ok to run this in clustered mode?

            cheers,
            Igor

            Igor Minar added a comment - And just to give you some numbers. I ran some simple benchmarks with and without the patch on our production db during our outage and this is the difference in time it takes to execute a query that retrieves a wiki page: before: 10-15 seconds after: 0.01-0.12 seconds WHAT A DIFFERENCE! My only concern is if the patch is cluster safe. We were running two nodes side by side, but since I don't understand all your clustering code (yet), I took one node down just to be safe. Can someone from Atlassian have a look at it and let us know if it's ok to run this in clustered mode? cheers, Igor

            Igor Minar added a comment -

            updated version of the patch

            Igor Minar added a comment - updated version of the patch

            Igor Minar added a comment -

            Hi Ivan,

            I'll attach an updated version of the patch. We just experienced major outage today that lasted the entire afternoon. According to the db investigation this was due to slow queries that were using the lower() function. After the patch was deployed, the outage cleared.

            I suggest that you consider this to be a blocker RFE/bug as it can take a site down easily.

            There is one more major bug that we found in atlassian-user code, I'll file a separate issue for that.

            cheers,
            Igor

            Igor Minar added a comment - Hi Ivan, I'll attach an updated version of the patch. We just experienced major outage today that lasted the entire afternoon. According to the db investigation this was due to slow queries that were using the lower() function. After the patch was deployed, the outage cleared. I suggest that you consider this to be a blocker RFE/bug as it can take a site down easily. There is one more major bug that we found in atlassian-user code, I'll file a separate issue for that. cheers, Igor

            Igor Minar added a comment -

            This is causing major problems for us at wikis.sun.com where we use MySQL5.

            There are other reports of this issue occurring with MySQL:
            http://jira.atlassian.com/browse/CONF-9152 (check out comments)
            https://support.atlassian.com/browse/CSP-11031

            MySQL RFE: http://bugs.mysql.com/bug.php?id=9639

            The title of this JIRA should be changed so that it's clear that this is a generic issue.

            PostgresSQL can create indexes for functions, but since Confluence doesn't generate the index, users don't benefit from this unless a DBA creates the index manually.

            Drupal was facing the very same issues in the past (http://drupal.org/node/83738), their attempt to resolving it is similar to what Patrick mentioned. I'm all for this approach as well.

            Igor Minar added a comment - This is causing major problems for us at wikis.sun.com where we use MySQL5. There are other reports of this issue occurring with MySQL: http://jira.atlassian.com/browse/CONF-9152 (check out comments) https://support.atlassian.com/browse/CSP-11031 MySQL RFE: http://bugs.mysql.com/bug.php?id=9639 The title of this JIRA should be changed so that it's clear that this is a generic issue. PostgresSQL can create indexes for functions, but since Confluence doesn't generate the index, users don't benefit from this unless a DBA creates the index manually. Drupal was facing the very same issues in the past ( http://drupal.org/node/83738 ), their attempt to resolving it is similar to what Patrick mentioned. I'm all for this approach as well.

            I agree with Patrick's solution. If we need to look up the title in a case-insensitive fashion and Hibernate doesn't support a generic way of creating such an index on all our supported databases, we should do the to-lower-casing ourselves and store it in a separate column. We would have to watch out for Locales though. Whatever Locale we use to store the lowercase version needs to stay the same forever, or we need to update the whole table when it changes.

            If this problem applies to DB2 it should also apply to PostgreSQL and Oracle. MySQL can be run with a case sensitive or insensitive collation.

            I'm hoping we can just not do a case-insensitive search when looking up page titles. Any search that generic should probably using Lucene index rather than the Database.

            Don Willis added a comment - I agree with Patrick's solution. If we need to look up the title in a case-insensitive fashion and Hibernate doesn't support a generic way of creating such an index on all our supported databases, we should do the to-lower-casing ourselves and store it in a separate column. We would have to watch out for Locales though. Whatever Locale we use to store the lowercase version needs to stay the same forever, or we need to update the whole table when it changes. If this problem applies to DB2 it should also apply to PostgreSQL and Oracle. MySQL can be run with a case sensitive or insensitive collation. I'm hoping we can just not do a case-insensitive search when looking up page titles. Any search that generic should probably using Lucene index rather than the Database.

            An alternate and DBMS agnostic solution would be for confluence to store a derived column itself, index the field, and ensure all uses of LOWER use the derived fields.

            Deleted Account (Inactive) added a comment - An alternate and DBMS agnostic solution would be for confluence to store a derived column itself, index the field, and ensure all uses of LOWER use the derived fields.

            DB2 for z/OS version 9 supports column expression indexes - so you could create an index on LOWER(COL).

            DB2 LUW does support indexes on column expressions through generated columns.
            http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp?topic=/com.ibm.db2.luw.admin.dbobj.doc/doc/c0020109.html

            Deleted Account (Inactive) added a comment - DB2 for z/OS version 9 supports column expression indexes - so you could create an index on LOWER(COL). DB2 LUW does support indexes on column expressions through generated columns. http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp?topic=/com.ibm.db2.luw.admin.dbobj.doc/doc/c0020109.html

              matt@atlassian.com Matt Ryall
              akazatchkov Anatoli
              Votes:
              11 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: