Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-38939

Rebuilding ancestor table is slow and inefficient on large instances

    XMLWordPrintable

Details

    Description

      Problem

      Rebuilding the ancestor table is required at certain circumstances (see the linked bug reports and feature requests). This procedure is very slow and inefficient on large instances. It can take several hours on medium-sized instances and over days on larger ones.

      Proposed solution

      Rewrite the code, possibly delegating the task to the database.

      Workaround for PostgreSQL

      Make sure to create an offline database backup before performing any of the below steps!

      The script will not alter the live data, it will create a table called confancestors_save that then can be copied instead of the original confancestors

      1. Create working table:
        create table confancestors_save as select * from confancestors where 1=2;
      2. Create the function to perform the data population afterwards:
        create function getgrandpa (originalchild bigint)
        returns void
        language plpgsql
        as $$
        DECLARE
                currentparent bigint;
        BEGIN
                currentparent:=originalchild;
                while currentparent is not null loop
                        currentparent := c.parentid from content c where c.contentid=currentparent;
                        if currentparent is not null then
                            #raise notice 'inserting row descendentid: % parentid: %', originalchild, currentparent;
                            insert into confancestors_save (ancestorid,descendentid) values (currentparent, originalchild);
                        end if;
                end loop;
        END $$;
      3. Populate the table with initial values (run it for all contentid from content where parentid is not null):
        select getgrandpa(contentid) from content where parentid is not null and content_status='current';
      4. Calculate depth and update accordingly:
        with recursive t(depth,ancestorid,descendentid) as (
                select 0,ancestorid,descendentid from confancestors_save
            union
                select
                    depth + 1,
                    confancestors_save.ancestorid,
                    confancestors_save.descendentid 
                from 
                    confancestors_save join t 
                        on confancestors_save.ancestorid = t.descendentid
        ) update confancestors_save set ancestorposition=a.depth from (select ancestorid,descendentid,max(depth) as depth from t group by ancestorid,descendentid) a where confancestors_save.ancestorid=a.ancestorid
        and confancestors_save.descendentid=a.descendentid;

      If the table rebuild from the UI finished, you can compare the two results before copying over the data by the following query (should return 0 rows):

      select * from confancestors_save cas full outer join confancestors ca using (ancestorid,descendentid,ancestorposition) where cas.ancestorid is null or ca.ancestorid is null;
      

      After that you can move over the data from this table to the live one by:

      1. Truncate the original table:
        truncate table confancestors;
      2. Populate the original table:
        insert into confancestors (select * from confancestors_save);
      3. Flush all caches
      4. Rebuild content indexes

      Attachments

        Issue Links

          Activity

            People

              dtaylor David Taylor (Inactive)
              pkoczan Peter Koczan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: