Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-10030

db2: queries that use 'lower' do not use index because of case sensitivity

    • 5
    • We collect Confluence feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

      NOTE: This suggestion is for Confluence Server. Using Confluence Cloud? See the corresponding suggestion.

      Most of the queries against CONTENT table use 'lower' db function for a title field. The goes for (space) key field in SPACES table.
      We do it to make a case insensitive comparison, however it means that db2 does not use the existing indexes for those fields and performs a full table scan.
      This results in a poor performance.

      This can be applicable to other database too, but it is not verified. Also other databases may have global 'case sensitivity' flag.

        1. lower-func-performance.patch
          3 kB
        2. lower-func-performance.patch
          19 kB
        3. lower-func-performance.patch
          18 kB
        4. lower-func-performance.patch
          28 kB
        5. lower-func-performance.patch
          34 kB

            [CONFSERVER-10030] db2: queries that use 'lower' do not use index because of case sensitivity

            Matt Ryall added a comment -

            As per our announcement, the last major version of Confluence to support DB2 was Confluence 4.2. Confluence 4.3 and later will not support DB2.

            Therefore this issue will not be fixed. See my comment above for a workaround in existing instances.

            Matt Ryall added a comment - As per our announcement , the last major version of Confluence to support DB2 was Confluence 4.2. Confluence 4.3 and later will not support DB2. Therefore this issue will not be fixed. See my comment above for a workaround in existing instances.

            Adding a generated column and index manually can resolve this problem on DB2.

            See the instructions on this page, under "DB2 for Linux or Windows": Creating a Lowercase Page Title Index.

            Matt Ryall added a comment - Adding a generated column and index manually can resolve this problem on DB2. See the instructions on this page, under "DB2 for Linux or Windows": Creating a Lowercase Page Title Index .

            Igor Minar added a comment -

            Adding the latest version of the patch with removed lower() function calls. This one removes the calls also from all the Java classes (these were giving us a hard time in production as well).

            Igor Minar added a comment - Adding the latest version of the patch with removed lower() function calls. This one removes the calls also from all the Java classes (these were giving us a hard time in production as well).

            adding a patch against confluence 3.2 that removes all lower() calls and relies only on mysql case-insensitive collation

            Igor Minar added a comment - adding a patch against confluence 3.2 that removes all lower() calls and relies only on mysql case-insensitive collation

            Matt Ryall added a comment -

            Confluence 3.0 includes some improved caching that alleviates this problem a lot for page lookups. We've also documented the instructions for creating lowercase indexes manually in several databases that support it (including DB2, thanks for the help – Patrick):

            http://confluence.atlassian.com/display/DOC/Creating+a+Lowercase+Page+Title+Index

            This issue will remain open, but should be much less of a problem if you can follow the instructions on that page.

            Matt Ryall added a comment - Confluence 3.0 includes some improved caching that alleviates this problem a lot for page lookups. We've also documented the instructions for creating lowercase indexes manually in several databases that support it (including DB2, thanks for the help – Patrick): http://confluence.atlassian.com/display/DOC/Creating+a+Lowercase+Page+Title+Index This issue will remain open, but should be much less of a problem if you can follow the instructions on that page.

            Igor Minar added a comment -

            rebased version compatible with confluence 2.10.1

            Igor Minar added a comment - rebased version compatible with confluence 2.10.1

            Igor Minar added a comment -

            Thanks Don. I'm not happy about the data redundancy, but if it means that you have only one db schema to support that is fast for everyone, then I think it's definitely worth it.

            The benefit for us is that you'll do all your QA with the same schema as we use, minimizing the chance of something breaking for us in the future because of this change.

            /i

            Igor Minar added a comment - Thanks Don. I'm not happy about the data redundancy, but if it means that you have only one db schema to support that is fast for everyone, then I think it's definitely worth it. The benefit for us is that you'll do all your QA with the same schema as we use, minimizing the chance of something breaking for us in the future because of this change. /i

            Don Willis added a comment -

            I completely agree that the use of lower in all these places is going to hurt performance for some databases and that Igor's approach is the most pragmatic one for us to take in fixing this. (This is despite the obvious ickiness of replicated data). I only suggested using the MySQL insensitivity because if we don't manage to prioritise this bug, then you're going to need to maintain whatever workaround you do, and I'd think that a simple workaround would be easier to maintain than one that is database agnostic.

            Don Willis added a comment - I completely agree that the use of lower in all these places is going to hurt performance for some databases and that Igor's approach is the most pragmatic one for us to take in fixing this. (This is despite the obvious ickiness of replicated data). I only suggested using the MySQL insensitivity because if we don't manage to prioritise this bug, then you're going to need to maintain whatever workaround you do, and I'd think that a simple workaround would be easier to maintain than one that is database agnostic.

            Igor Minar added a comment -

            Don,

            We took the risk and we deployed the patch to production yesterday. So far so good

            The data was manually migrated using SQL, the queries should be in the comment section of the patch.

            Thanks for the cluster update. I thought that just a simple serialization was used to push the objects over the wire to the second node, but I wasn't sure.

            I'd prefer the solution proposed in the patch over changing the hibernate queries and exploiting MySQL insensitivity because the solution is the smallest common denominator for all of your supported databases, which would make it easier for you to test and support it.

            Using this solution you could easily and safely provide this enhancements to all of your customers regardless of their database via an upgrade task (I can image that creating an upgrade task that would create custom indexes for all the supported databases would be much trickier).

            Igor Minar added a comment - Don, We took the risk and we deployed the patch to production yesterday. So far so good The data was manually migrated using SQL, the queries should be in the comment section of the patch. Thanks for the cluster update. I thought that just a simple serialization was used to push the objects over the wire to the second node, but I wasn't sure. I'd prefer the solution proposed in the patch over changing the hibernate queries and exploiting MySQL insensitivity because the solution is the smallest common denominator for all of your supported databases, which would make it easier for you to test and support it. Using this solution you could easily and safely provide this enhancements to all of your customers regardless of their database via an upgrade task (I can image that creating an upgrade task that would create custom indexes for all the supported databases would be much trickier).

            Hello,
            I want to reiterate
            1. DB2 9 for z/OS supports index on column expressions.
            2. DB2 LUW supports generated columns with index on generated columns - where the database maintains the generated column and will automatically rewrite queries to use the generated column. In DB2 LUW 9.5, generated columns can be created as hidden columns (not returned for SELECT *).

            So this problem is solvable in current releases of DB2 - but it requires user action to solve it.

            This issue is a major problem for any site that has large tables. Basically, any customer that's going to have a large wiki installation - this issue is a show-stopper. The lower is used all over the place. Not just on content. The biggest issue is on the SPACES table - since it's used to look up content, blog posts, etc. So it's either a unique lookup on SPACES, or a table space scan to find 1 row.

            We also saw this issue on LINKS.
            select outgoingli0_.LINKID as LINKID, outgoingli0_.DESTPAGETITLE as DESTPAGE2_
            , outgoingli0_.DESTSPACEKEY as DESTSPAC3_, outgoingli0_.CONTENTID as CONTENTID
            , outgoingli0_.CREATOR as CREATOR, outgoingli0_.CREATIONDATE as CREATION6_
            , outgoingli0_.LASTMODIFIER as LASTMODI7_, outgoingli0_.LASTMODDATE as LASTMODD8_
            from LINKS outgoingli0_ CARD 1457006 NPAGES 57402
            where (lower(outgoingli0_.DESTSPACEKEY)=? ) COLCARD 11320
            and(lower(outgoingli0_.DESTPAGETITLE)=? ) COLCARD 356000

            BLOGPOST Title also has the issue.

            select blogpost0_.CONTENTID as CONTENTID, blogpost0_.SPACEID as SPACEID
            , blogpost0_.TITLE as TITLE, blogpost0_.VERSION as VERSION
            , blogpost0_.CREATOR as CREATOR, blogpost0_.CREATIONDATE as CREATION6_
            , blogpost0_.LASTMODIFIER as LASTMODI7_, blogpost0_.LASTMODDATE as LASTMODD8_
            , blogpost0_.VERSIONCOMMENT as VERSIONC9_, blogpost0_.PREVVER as PREVVER
            , blogpost0_.CONTENT_STATUS as CONTENT11_
            from CONTENT blogpost0_
            , SPACES space1_
            where blogpost0_.CONTENTTYPE='BLOGPOST'
            and ((lower(space1_.SPACEKEY)=?
            and blogpost0_.SPACEID=space1_.SPACEID)
            and(lower(blogpost0_.TITLE)=? )
            and(blogpost0_.CREATIONDATE>=? )
            and(blogpost0_.CREATIONDATE<? )
            and(blogpost0_.PREVVER is null )
            and(blogpost0_.CONTENT_STATUS='current' ))

            CONTENT_LABEL table.
            select distinct label0_.LABELID as LABELID
            , label0_.NAME as NAME, label0_.OWNER as OWNER
            , label0_.NAMESPACE as NAMESPACE, label0_.CREATIONDATE as CREATION5_
            , label0_.LASTMODDATE as LASTMODD6_
            from LABEL label0_ CARD 22253 NPAGES 424
            , CONTENT_LABEL labelling1_ CARD 73483 NPAGES 1711
            where (lower(labelling1_.SPACEKEY)=? ) COLCARD 7813
            AND(labelling1_.LABELID=label0_.LABELID ) COLCARD 22253/22253
            AND(label0_.NAMESPACE=? ) COLCARD 4
            order by label0_.NAME

            ATTACHMENTS table.
            SELECT attachmentid, title, contenttype, creator, creationdate
            , lastmodifier, lastmoddate, filesize, attachment_comment
            , attversion, prevver
            FROM WIKI_RW.attachments CARD 403449
            where pageid = ? COLCARD 94288
            and lower(title) = ? COLCARD 253952
            order by attversion desc with UR

            Our wiki has a lot of content, a lot of attachements, a lot of spaces. Any big site - this issue is an absolute performance killer . The wiki quickly becomes useless.

            Deleted Account (Inactive) added a comment - Hello, I want to reiterate 1. DB2 9 for z/OS supports index on column expressions. 2. DB2 LUW supports generated columns with index on generated columns - where the database maintains the generated column and will automatically rewrite queries to use the generated column. In DB2 LUW 9.5, generated columns can be created as hidden columns (not returned for SELECT *). So this problem is solvable in current releases of DB2 - but it requires user action to solve it. This issue is a major problem for any site that has large tables. Basically, any customer that's going to have a large wiki installation - this issue is a show-stopper. The lower is used all over the place. Not just on content. The biggest issue is on the SPACES table - since it's used to look up content, blog posts, etc. So it's either a unique lookup on SPACES, or a table space scan to find 1 row. We also saw this issue on LINKS. select outgoingli0_.LINKID as LINKID, outgoingli0_.DESTPAGETITLE as DESTPAGE2_ , outgoingli0_.DESTSPACEKEY as DESTSPAC3_, outgoingli0_.CONTENTID as CONTENTID , outgoingli0_.CREATOR as CREATOR, outgoingli0_.CREATIONDATE as CREATION6_ , outgoingli0_.LASTMODIFIER as LASTMODI7_, outgoingli0_.LASTMODDATE as LASTMODD8_ from LINKS outgoingli0_ CARD 1457006 NPAGES 57402 where (lower(outgoingli0_.DESTSPACEKEY)=? ) COLCARD 11320 and(lower(outgoingli0_.DESTPAGETITLE)=? ) COLCARD 356000 BLOGPOST Title also has the issue. select blogpost0_.CONTENTID as CONTENTID, blogpost0_.SPACEID as SPACEID , blogpost0_.TITLE as TITLE, blogpost0_.VERSION as VERSION , blogpost0_.CREATOR as CREATOR, blogpost0_.CREATIONDATE as CREATION6_ , blogpost0_.LASTMODIFIER as LASTMODI7_, blogpost0_.LASTMODDATE as LASTMODD8_ , blogpost0_.VERSIONCOMMENT as VERSIONC9_, blogpost0_.PREVVER as PREVVER , blogpost0_.CONTENT_STATUS as CONTENT11_ from CONTENT blogpost0_ , SPACES space1_ where blogpost0_.CONTENTTYPE='BLOGPOST' and ((lower(space1_.SPACEKEY)=? and blogpost0_.SPACEID=space1_.SPACEID) and(lower(blogpost0_.TITLE)=? ) and(blogpost0_.CREATIONDATE>=? ) and(blogpost0_.CREATIONDATE<? ) and(blogpost0_.PREVVER is null ) and(blogpost0_.CONTENT_STATUS='current' )) CONTENT_LABEL table. select distinct label0_.LABELID as LABELID , label0_.NAME as NAME, label0_.OWNER as OWNER , label0_.NAMESPACE as NAMESPACE, label0_.CREATIONDATE as CREATION5_ , label0_.LASTMODDATE as LASTMODD6_ from LABEL label0_ CARD 22253 NPAGES 424 , CONTENT_LABEL labelling1_ CARD 73483 NPAGES 1711 where (lower(labelling1_.SPACEKEY)=? ) COLCARD 7813 AND(labelling1_.LABELID=label0_.LABELID ) COLCARD 22253/22253 AND(label0_.NAMESPACE=? ) COLCARD 4 order by label0_.NAME ATTACHMENTS table. SELECT attachmentid, title, contenttype, creator, creationdate , lastmodifier, lastmoddate, filesize, attachment_comment , attversion, prevver FROM WIKI_RW.attachments CARD 403449 where pageid = ? COLCARD 94288 and lower(title) = ? COLCARD 253952 order by attversion desc with UR Our wiki has a lot of content, a lot of attachements, a lot of spaces. Any big site - this issue is an absolute performance killer . The wiki quickly becomes useless.

              matt@atlassian.com Matt Ryall
              akazatchkov Anatoli
              Votes:
              11 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: