Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-36018

Duplicates in the People Directory due to duplicates in the user_mapping table

    XMLWordPrintable

Details

    Description

       

      NOTE: This bug report is for Confluence Server. Using Confluence Cloud? See the corresponding bug report.

      This can occur because user_mapping.lower_user_name is null. You may have a single user record for the affected user, or you may have another record which does or does not include a lower_username. In some scenarios, there are more than two records. Some of the records may have different casing in the username column.

       

      • Scenario 1 - Single user record, no lower username exists
        user_key username lower_username
        ff8080814a5a97df014a5a97fb240001 userA  
      • Scenario 2 - Duplicated users with no lower username exists, regardless of casings in the username column
        user_key username lower_username
        ff8080814a5a97df014a5a97fb240001 userA  
        ff8080814a5a97df014a5a97fb240002 userA  
        ff8080814a5a97df014a5a97fb240003 usera  
      • Scenario 3 - Duplicated users with lower_username
          username lower_username
        ff8080814a5a97df014a5b16c37c0008 userA usera
        ff8080814a5a97df014a5a97fb240001 userA  
      • Scenario 4 - Duplicated users with lower_username, regardless of casings in the username column
        user_key username lower_username
        ff8080814a5a97df014a5b16c37c0008 userA usera
        ff8080814a5a97df014a5a97fb240001 UserA  
        ff8080814a5a97df014a5a97fb240002 userA  

      UPDATE

      This bug no longer causes duplicates in the people directory because we have hack/patch to hide duplicates on that page, even if duplicates exist in the index. However, dups still appear in @ mentions and other places.When dups show in @ mentions, the dup will show as "Unknown user (<username>) (<username>). Watch the following bug report for track issues with Unknown Users: https://jira.atlassian.com/browse/CONFSERVER-54971

      Diagnosis - is this affecting you?

      First, try a content reindex. If this doesn't help, run this query (on any version). If it returns any results, this bug is the cause of your duplicates.

      Diagnosis 1 - NULL users
      SELECT  * FROM user_mapping where lower_username is null;
      

      If this query returns nothing, have a look at CONF-30050 as well.

      Now you need to determine if you have single records or duplicated records, and which scenario types that you are affected with. The workaround you perform will depend on this.

      Diagnosis 2 - Determine Single/Duplicated records
      select u.* from user_mapping u where lower(u.username) in (select lower(nullrecord.username) from user_mapping nullrecord where lower_username is null) order by u.username;
      

      Proceed to the appropriate workaround based on whether you see single or duplicated records. If you have both, apply the single records workaround first, then duplicated records.

      Diagnosis - Why is this affecting you?

      This issue is observed when there are duplicated users that are originated from across different user directories. For example, if Confluence is connected to LDAP, and also connected to Crowd that's connected to the same LDAP with the same sets of users. 

      The issue may rises when renaming users takes place. As a best practice, clean up the user directories first before going through the cleanup below. 

      SINGLE RECORDS Workaround

      Set the lower_username value for single records only. This will not affect nulls for duplicate records.

      Single Records Workaround - Scenario type 1 (tested on PSQL and MSSQL)
      update user_mapping set lower_username = lower(username) where lower_username is null and lower(username) in 
      (select LOWER(nullrecords.username) from user_mapping as nullrecords group by LOWER(nullrecords.username) having count(*) = 1);
      

       This query will search for all single records and update its lower_username with a lowercase of its username. 

      Now run the diagnosis query again to ensure you are no longer affected with the single records issue/Scenario 1 type issue.

      DUPLICATED RECORDS Workaround

       From the diagnosis query, if you see that you are affected by Scenario type 2 issue, run the following SQL query first. Otherwise, proceed with #1 - Clean up null user records.

      Single Records Workaround - Scenario type 2 (tested on PSQL)
      select username, lower_username, count(*) as count into temp1 from user_mapping group by username, lower_username having count(*) > 1;
      select username into temp2 from temp1 where lower(username) not in (select lower(username) from user_mapping where lower_username is not null);
      select username, max(user_key) as keeping into uniqueone FROM user_mapping where username in (select username FROM temp2) group by username;
      update user_mapping set lower_username = lower(username) where user_key in (select keeping from uniqueone);
      drop table temp1;
      drop table temp2;
      drop table uniqueone;
      

        This query will search for records that has a scenario type 2 issue, and give one of the record to have a lower_username.

      #1 - Cleanup null user records without content associated.

      First, try to remove the null values. If there is no content that has been created by the users, this set of queries should work. If you run into any Foreign Key constraint errors during this process, proceed to #2.

      Cleanup

      • For Confluence 5.7 and above
        DUPLICATED RECORDS cleanup - 5.7 and above
        delete from imagedetails where attachmentid in
         (select avatar.contentid from content avatar where avatar.pageid in
          (select userinfo.contentid from content userinfo 
           where userinfo.contenttype = 'USERINFO' 
           and userinfo.username in 
            (select user_key from user_mapping where lower_username is null)));
        
        delete from contentproperties where contentid in
         (select avatar.contentid from content avatar where avatar.pageid in
          (select userinfo.contentid from content userinfo 
           where userinfo.contenttype = 'USERINFO' 
           and userinfo.username in 
            (select user_key from user_mapping where lower_username is null)));
        
        delete from content avatar where avatar.pageid in
         (select userinfo.contentid from content userinfo 
          where userinfo.contenttype = 'USERINFO' 
          and userinfo.username in 
           (select user_key from user_mapping where lower_username is null));
        
        delete from content userinfo 
         where userinfo.contenttype = 'USERINFO' 
         and prevver is not null
         and userinfo.username in
          (select user_key from user_mapping where lower_username is null);
        
        delete from content userinfo 
         where userinfo.contenttype = 'USERINFO' 
         and prevver is null
         and userinfo.username in
          (select user_key from user_mapping where lower_username is null);
        
        delete from logininfo where username in 
          (select user_key from user_mapping where lower_username is null);
        
        delete from user_mapping where lower_username is null;
        

        SQL query for MySQL: DUP_RECORDS_W1-MYSQL.sql

      • For Confluence 5.6.x and below
        DUPLICATED RECORDS cleanup - 5.6 and below
        delete from IMAGEDETAILS 
        where ATTACHMENTID in (select ATTACHMENTID from ATTACHMENTS where PAGEID in (select CONTENTID from CONTENT  where CONTENTTYPE = 'USERINFO' and USERNAME in (select user_key from user_mapping where lower_username is null)));
         
        delete from ATTACHMENTS where PAGEID in 
        (select CONTENTID from CONTENT where CONTENTTYPE = 'USERINFO' and USERNAME in (select user_key from user_mapping where lower_username is null));
        
        delete from CONTENT where CONTENTTYPE = 'USERINFO' and PREVVER is not null and USERNAME in 
        (select user_key from user_mapping where lower_username is null);
        
        delete from CONTENT where CONTENTTYPE = 'USERINFO' and PREVVER is null and USERNAME in 
        (select user_key from user_mapping where lower_username is null);
        
        delete from user_mapping where lower_username is null;
        

      #2 - If the null user records have content associated with them.

      Step One: Find the affected users

      Firstly, run the below query, and save the results. You'll need the oldKey and newKey from the results.

      KeyPair Generator (tested on PSQL, MSSQL, MYSQL and Oracle)
      SELECT 
          user_key AS oldKey,
          (SELECT 
                  user_key
              FROM
                  user_mapping u
              WHERE
                  lower(u.username) = u.lower_username
                      AND lower(u.username) = lower(um.username)) AS newKey,
          um.username,
          um.lower_username
      FROM
          user_mapping um
      WHERE
          user_key IN (SELECT 
                  user_key
              FROM
                  user_mapping
              WHERE
                  (lower(username) != lower_username))
              OR (lower_username is null)

      Step Two: Validate your Keys:

      Check if the result of the Keypair Generator produces any NULL values for the newkey column. NULL newkey values could exist when Scenario type 2 issue exists in the database. Go back to the beginning of this report to ensure that the scenario 1 and 2 has been ruled out.

      Only proceed with Step Three when there are no NULL values in the newkey column. 

      Step Three: Individual/Mass fixes

      3.1 - Individual cleanup

      Run the following SQL queries to map @oldkey into @newkey. This workaround is Ideal when Keypair generator only result with 1-5 users.

      If the KeyPair generator resulted with a large number of users, you can use a Python script provided in the next step to generate a set of SQL queries.

      Change all Occurrence of oldkey and newkey in the SQL query template below, with the oldkey and newkey Key pair obtained from KeyPair Generator, either by using:

      • Declare method
      • Simple text editor

      To use the Declare method, simply change the Declare Variables below to match the Key Pair obtained from the KeyPair Generator. Continue with running the SQL queries in the  

      Declare Variables (MySQL)
      -- Declare Variables: 
      SET @oldKey := 'foo'; SET @newKey := 'bar';
      

      Or, use a text editor to find and replace all occurrence of @oldkey and @newkey, from the SQL query template below, for each Key Pairs obtained from the KeyPair Generator.

      DUPLICATED RECORDS - Individual Fixes
      -- TRACKBACKLINKS
      UPDATE TRACKBACKLINKS SET LASTMODIFIER = @newKey WHERE LASTMODIFIER = @oldKey;
      UPDATE TRACKBACKLINKS SET CREATOR = @newKey WHERE CREATOR = @oldKey;
      
      -- SPACES
      UPDATE SPACES SET LASTMODIFIER = @newKey WHERE LASTMODIFIER = @oldKey;
      UPDATE SPACES SET CREATOR = @newKey WHERE CREATOR = @oldKey;
      
      -- SPACE PERMISSIONS
      UPDATE SPACEPERMISSIONS SET LASTMODIFIER = @newKey WHERE LASTMODIFIER = @oldKey;
      UPDATE SPACEPERMISSIONS SET CREATOR = @newKey WHERE CREATOR = @oldKey;
      UPDATE SPACEPERMISSIONS SET PERMUSERNAME = @newKey WHERE PERMUSERNAME = @oldKey;
      
      -- SPACEGROUPS ( Not required for version 6 and above)
      UPDATE SPACEGROUPS SET LASTMODIFIER = @newKey WHERE LASTMODIFIER = @oldKey;
      UPDATE SPACEGROUPS SET CREATOR = @newKey WHERE CREATOR = @oldKey;
      
      -- SPACEGROUPPERMISSIONS ( Not required for version 6 and above)
      UPDATE SPACEGROUPPERMISSIONS SET PERMUSERNAME = @newKey WHERE PERMUSERNAME = @oldKey;
      
      -- PAGETEMPLATES
      UPDATE PAGETEMPLATES SET LASTMODIFIER = @newKey WHERE LASTMODIFIER = @oldKey;
      UPDATE PAGETEMPLATES SET CREATOR = @newKey WHERE CREATOR = @oldKey;
      
      -- NOTIFICATIONS
      UPDATE NOTIFICATIONS SET LASTMODIFIER = @newKey WHERE LASTMODIFIER = @oldKey;
      UPDATE NOTIFICATIONS SET CREATOR = @newKey WHERE CREATOR = @oldKey;
      UPDATE NOTIFICATIONS SET USERNAME = @newKey WHERE USERNAME = @oldKey;
      
      -- LINKS
      UPDATE LINKS SET LASTMODIFIER = @newKey WHERE LASTMODIFIER = @oldKey;
      UPDATE LINKS SET CREATOR = @newKey WHERE CREATOR = @oldKey;
      
      -- LIKES
      UPDATE LIKES SET USERNAME = @newKey WHERE USERNAME = @oldKey;
      
      -- LABEL
      UPDATE LABEL SET OWNER = @newKey WHERE OWNER = @oldKey;
      
      -- FOLLOW_CONNECTIONS
      UPDATE FOLLOW_CONNECTIONS SET FOLLOWER = @newKey WHERE FOLLOWER = @oldKey;
      UPDATE FOLLOW_CONNECTIONS SET FOLLOWEE = @newKey WHERE FOLLOWEE = @oldKey;
      
      -- EXTRNLINKS
      UPDATE EXTRNLNKS SET CREATOR = @newKey WHERE CREATOR = @oldKey;
      UPDATE EXTRNLNKS SET LASTMODIFIER = @newKey WHERE LASTMODIFIER = @oldKey;
      
      -- CONTENT_PERM
      UPDATE CONTENT_PERM SET CREATOR = @newKey WHERE CREATOR = @oldKey;
      UPDATE CONTENT_PERM SET LASTMODIFIER = @newKey WHERE LASTMODIFIER = @oldKey;
      UPDATE CONTENT_PERM SET USERNAME = @newKey WHERE USERNAME = @oldKey;
      
      -- CONTENT_LABEL
      UPDATE CONTENT_LABEL SET OWNER = @newKey WHERE OWNER = @oldKey;
      
      -- CONTENT
      UPDATE CONTENT SET LASTMODIFIER = @newKey WHERE LASTMODIFIER = @oldKey;
      UPDATE CONTENT SET USERNAME = @newKey WHERE USERNAME = @oldKey;
      UPDATE CONTENT SET CREATOR = @newKey WHERE CREATOR = @oldKey;
      
      -- USERCONTENT_RELATION ( Only required for version 6 and above)
      UPDATE USERCONTENT_RELATION SET CREATOR = @newKey WHERE CREATOR = @oldKey;
      UPDATE USERCONTENT_RELATION SET SOURCEUSER = @newKey WHERE SOURCEUSER = @oldKey;
      UPDATE USERCONTENT_RELATION SET LASTMODIFIER = @newKey WHERE LASTMODIFIER = @oldKey;
      
      -- ATTACHMENTS ( Not required for version 5.7 and above)
      UPDATE ATTACHMENTS SET LASTMODIFIER = @newKey WHERE LASTMODIFIER = @oldKey;
      UPDATE ATTACHMENTS SET CREATOR = @newKey WHERE CREATOR = @oldKey;
      
      -- LOGIN INFO AND USER_MAPPING
      DELETE FROM logininfo WHERE USERNAME = @oldKey;
      DELETE FROM user_mapping where user_key = @oldKey;
      

      For MSSQL: DUPLICATED_RECORDS_Individual_Fixes-mssql.sql

      Note about the SQL query template

       The SQL queries provided could be applied to all confluence versions 5.x - 6.x therefore, there may be ERRORS for the following tables. Errors that says  ERROR: "<table>" does not exist can be ignored. The tables that may be affected are:

      1. SPACEGROUPS
      2. SPACEGROUPPERMISSIONS
      3. USERCONTENT_RELATION
      4. ATTACHMENTS

       The template does not contain fix for bodycontent table. This means that on pages where there were user mentions of @oldKey, you may see broken link instead on its place. You may choose to fix this occurrence by using the following search and replace query, however, running this query may run for a long time depending on the size of bodycontent table.

      BODYCONTENT SET BODY = REPLACE(BODY,@oldkey,@newkey);
      

      3.2 Mass Cleanup Using the mass duplicate fixer

      If you have a large number of duplicates, you may choose to use the  mass-dup-fixer.zipto generate the SQL queries. You'll need python (tested on Python 2.7 and Python 3) installed to run the script 

      1. Ensure that Python is installed
      2. Run the KeyPair Generator 
      3. Save the output of that query in the format oldKey<tab>newKey
      4. Name the file keypairs.txt.
      5. Extract mass-dup-fixer.zip to a directory,
      6. Place your keypairs.txt file into this directory.
      7. Run make-mass-dup-fixer.py from the directory you extracted everything to
      8. It will generate with a file called output.sql
      9. Run the SQL queries inside the output.sql against Confluence database
      Note about the SQL queries generated by the python script

       The SQL queries generated by the python script could be applied to all confluence versions 5.x - 6.x therefore, there may be ERRORS for the following tables. Errors that says  ERROR: "<table>" does not exist can be ignored. The tables that may be affected are:

      1. SPACEGROUPS
      2. SPACEGROUPPERMISSIONS
      3. USERCONTENT_RELATION
      4. ATTACHMENTS

       The template does not contain fix for bodycontent table. This means that on pages where there were user mentions of @oldKey, you may see broken link instead on its place.

      Step Four: Ensure that you are no longer affected by this issue

      Now, go back and run the queries in Diagnosis 1 and 2 again to ensure that all users has a valid lower_username.

      NB: If you are still experiencing problems, and are on Confluence 5.9.7 or earlier have a look at CONF-30050 as well.

       

      Attachments

        1. Screen Shot 2016-01-20 at 3.54.54 pm.png
          Screen Shot 2016-01-20 at 3.54.54 pm.png
          17 kB
        2. mass-dup-fixer-MSSQL.zip
          2 kB
        3. mass-dup-fixer.zip
          1 kB
        4. DUPLICATED_RECORDS_Individual_Fixes-mssql.sql
          3 kB
        5. DUP_RECORDS_W1-MYSQL.sql
          2 kB
        6. confserver36018_diagnosis.png
          confserver36018_diagnosis.png
          10 kB

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dunterwurzacher Denise Unterwurzacher [Atlassian] (Inactive)
              Votes:
              16 Vote for this issue
              Watchers:
              64 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: