Uploaded image for project: 'Bitbucket Data Center'
  1. Bitbucket Data Center
  2. BSERV-20169

Search index being rebuild when failed to obtain index version

XMLWordPrintable

      Issue Summary

      During Bitbucket startup, there is a StartupChecksJob to check the search index version. In the event of some soft failure in the Search server, Bitbucket is not able to retrieve the index version and proceeds to default the index version to 1 and trigers the upgrade task to upgrade the index version.

      Steps to Reproduce

      1. Spin up an ES Docker container:
        docker run -d \
          --name elasticsearch \
          -p 9200:9200 \
          -p 9300:9300 \
          -e "discovery.type=single-node" \
          docker.elastic.co/elasticsearch/elasticsearch:7.16.3
        
      2. Connect to Bitbucket and let the indexes be created.
      3. To mock the NoShardAvailableActionException, I'm renaming the index content for bitbucket-index-version on the ES filesystem.
        • Current indices:
          $ curl 'localhost:9200/_cat/indices'
          green  open .geoip_databases        E-kg5DqfSvGr4i99YYiFvQ 1 0 40 0 38.2mb 38.2mb
          yellow open bitbucket-index-version hyubcMBRRQ2WIyHjvSwYYg 1 1  1 0  3.4kb  3.4kb
          yellow open bitbucket-search        q-LGUpSITjeEKICWso0MBA 5 1  0 0  1.1kb  1.1kb
          yellow open bitbucket-index-state   ugK8g-9EQm-gUEyYJGQckg 5 1  0 0  1.1kb  1.1kb
          yellow open bitbucket-project       EMphNpyXR86JKUFWMTcG5Q 5 1  0 0  1.1kb  1.1kb
          yellow open bitbucket-repository    Q5oJ85P3Q-GqzqLGbY_VHw 5 1  0 0  1.1kb  1.1kb
          
        • "Corrupt" the data by renaming the directory int he filesystem:
          Needs a restart after these changes.
          # ll
          total 40
          drwxrwxr-x 10 elasticsearch root 4096 Aug 27 15:11 ./
          drwxrwxr-x  5 elasticsearch root 4096 Aug 27 15:12 ../
          drwxrwxr-x  4 elasticsearch root 4096 Aug 27 15:12 E-kg5DqfSvGr4i99YYiFvQ/
          drwxrwxr-x  8 elasticsearch root 4096 Aug 27 15:12 EMphNpyXR86JKUFWMTcG5Q/
          drwxrwxr-x  8 elasticsearch root 4096 Aug 27 15:12 Q5oJ85P3Q-GqzqLGbY_VHw/
          drwxrwxr-x  4 elasticsearch root 4096 Aug 27 15:12 hyubcMBRRQ2WIyHjvSwYYg/
          drwxrwxr-x  8 elasticsearch root 4096 Aug 27 15:12 q-LGUpSITjeEKICWso0MBA/
          drwxrwxr-x  8 elasticsearch root 4096 Aug 27 15:12 ugK8g-9EQm-gUEyYJGQckg/
          # mv hyubcMBRRQ2WIyHjvSwYYg hyubcMBRRQ2WIyHjvSwYYg-bac
          # ll
          total 40
          drwxrwxr-x 10 elasticsearch root 4096 Aug 27 15:13 ./
          drwxrwxr-x  5 elasticsearch root 4096 Aug 27 15:12 ../
          drwxrwxr-x  4 elasticsearch root 4096 Aug 27 15:12 E-kg5DqfSvGr4i99YYiFvQ/
          drwxrwxr-x  8 elasticsearch root 4096 Aug 27 15:12 EMphNpyXR86JKUFWMTcG5Q/
          drwxrwxr-x  8 elasticsearch root 4096 Aug 27 15:12 Q5oJ85P3Q-GqzqLGbY_VHw/
          drwxrwxr-x  4 elasticsearch root 4096 Aug 27 15:12 hyubcMBRRQ2WIyHjvSwYYg-bac/
          drwxrwxr-x  8 elasticsearch root 4096 Aug 27 15:12 q-LGUpSITjeEKICWso0MBA/
          drwxrwxr-x  8 elasticsearch root 4096 Aug 27 15:12 ugK8g-9EQm-gUEyYJGQckg/
          
        • Result:
          $ curl 'localhost:9200/_cat/indices'
          red    open bitbucket-index-version hyubcMBRRQ2WIyHjvSwYYg 1 1                   
          green  open .geoip_databases        E-kg5DqfSvGr4i99YYiFvQ 1 0 40 0 38.2mb 38.2mb
          yellow open bitbucket-search        q-LGUpSITjeEKICWso0MBA 5 1  0 0  1.1kb  1.1kb
          yellow open bitbucket-index-state   ugK8g-9EQm-gUEyYJGQckg 5 1  0 0  1.1kb  1.1kb
          yellow open bitbucket-project       EMphNpyXR86JKUFWMTcG5Q 5 1  0 0  1.1kb  1.1kb
          yellow open bitbucket-repository    Q5oJ85P3Q-GqzqLGbY_VHw 5 1  0 0  1.1kb  1.1kb
          
          $ curl 'localhost:9200/bitbucket-index-version/_doc/index-version'
          {"error":{"root_cause":[{"type":"no_shard_available_action_exception","reason":"No shard available for [get [bitbucket-index-version][_doc][index-version]: routing [null]]"}],"type":"no_shard_available_action_exception","reason":"No shard available for [get [bitbucket-index-version][_doc][index-version]: routing [null]]"},"status":503}
          
        • From the ES logs:
          {"type": "server", "timestamp": "2025-08-27T15:17:55,501Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "docker-cluster", "node.name": "6b5ccd32ebe0", "message": "path: /bitbucket-index-version/_doc/index-version, params: {index=bitbucket-index-version, id=index-version}", "cluster.uuid": "CRo1q0n1R4iGJUYhfxN5QQ", "node.id": "-nfTPzPXTiGR6OvFACvsBg" , 
          "stacktrace": ["org.elasticsearch.action.NoShardAvailableActionException: No shard available for [get [bitbucket-index-version][_doc][index-version]: routing [null]]",
          "at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.perform(TransportSingleShardAction.java:217) [elasticsearch-7.16.3.jar:7.16.3]",
          "at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.start(TransportSingleShardAction.java:194) [elasticsearch-7.16.3.jar:7.16.3]",
          
      4. Now restart Bitbucket:
        • Bitbucket logs:
          2025-08-27 15:20:33,091 INFO  [main]  c.a.b.i.boot.log.BuildInfoLogger Starting Bitbucket 8.19.21 (a00dfd7 built on Tue Aug 12 03:01:56 UTC 2025)
          ...
          2025-08-27 15:22:12,674 INFO  [Caesium-1-1]  c.a.b.i.s.i.jobs.StartupChecksJob Running startup jobs for search
          2025-08-27 15:22:12,776 INFO  [Caesium-1-1]  c.a.b.i.s.i.u.DefaultUpgradeService Executing upgrade task:[Update path and filename fields for file search]
          2025-08-27 15:22:14,139 INFO  [Caesium-1-1]  c.a.b.i.s.i.u.DefaultUpgradeService Successfully completed upgrade task:[Update path and filename fields for file search]
          2025-08-27 15:22:19,597 ERROR [Caesium-1-1]  c.a.b.i.s.i.IndexingSynchronizationService An error was encountered while checking or creating the mapping in the search server
          com.atlassian.bitbucket.internal.search.indexing.exceptions.IndexException: get cluster-health failed
          ...
          2025-08-27 15:23:19,614 INFO  [Caesium-1-4]  c.a.b.i.s.i.jobs.StartupChecksJob Running startup jobs for search
          2025-08-27 15:23:19,629 INFO  [Caesium-1-4]  c.a.b.i.s.i.u.DefaultUpgradeService Executing upgrade task:[Update path and filename fields for file search]
          2025-08-27 15:23:20,874 INFO  [Caesium-1-4]  c.a.b.i.s.i.u.DefaultUpgradeService Successfully completed upgrade task:[Update path and filename fields for file search]
          2025-08-27 15:23:25,882 ERROR [Caesium-1-4]  c.a.b.i.s.i.IndexingSynchronizationService An error was encountered while checking or creating the mapping in the search server
          com.atlassian.bitbucket.internal.search.indexing.exceptions.IndexException: get cluster-health failed
          ...
          2025-08-27 15:25:25,893 INFO  [Caesium-1-1]  c.a.b.i.s.i.jobs.StartupChecksJob Running startup jobs for search
          2025-08-27 15:25:25,906 INFO  [Caesium-1-1]  c.a.b.i.s.i.u.DefaultUpgradeService Executing upgrade task:[Update path and filename fields for file search]
          2025-08-27 15:25:27,140 INFO  [Caesium-1-1]  c.a.b.i.s.i.u.DefaultUpgradeService Successfully completed upgrade task:[Update path and filename fields for file search]
          2025-08-27 15:25:32,148 ERROR [Caesium-1-1]  c.a.b.i.s.i.IndexingSynchronizationService An error was encountered while checking or creating the mapping in the search server
          com.atlassian.bitbucket.internal.search.indexing.exceptions.IndexException: get cluster-health failed
          ...
          
        • ES logs:
          {"type": "server", "timestamp": "2025-08-27T15:22:12,767Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "docker-cluster", "node.name": "6b5ccd32ebe0", "message": "path: /bitbucket-index-version/_doc/index-version, params: {index=bitbucket-index-version, id=index-version}", "cluster.uuid": "CRo1q0n1R4iGJUYhfxN5QQ", "node.id": "-nfTPzPXTiGR6OvFACvsBg" , 
          "stacktrace": ["org.elasticsearch.action.NoShardAvailableActionException: No shard available for [get [bitbucket-index-version][_doc][index-version]: routing [null]]",
          "at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.perform(TransportSingleShardAction.java:217) [elasticsearch-7.16.3.jar:7.16.3]",
          "at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.start(TransportSingleShardAction.java:194) [elasticsearch-7.16.3.jar:7.16.3]",
          ...
          {"type": "server", "timestamp": "2025-08-27T15:22:12,784Z", "level": "INFO", "component": "o.e.c.m.MetadataDeleteIndexService", "cluster.name": "docker-cluster", "node.name": "6b5ccd32ebe0", "message": "[bitbucket-repository/Q5oJ85P3Q-GqzqLGbY_VHw] deleting index", "cluster.uuid": "CRo1q0n1R4iGJUYhfxN5QQ", "node.id": "-nfTPzPXTiGR6OvFACvsBg"  }
          {"type": "server", "timestamp": "2025-08-27T15:22:12,837Z", "level": "INFO", "component": "o.e.c.m.MetadataDeleteIndexService", "cluster.name": "docker-cluster", "node.name": "6b5ccd32ebe0", "message": "[bitbucket-project/EMphNpyXR86JKUFWMTcG5Q] deleting index", "cluster.uuid": "CRo1q0n1R4iGJUYhfxN5QQ", "node.id": "-nfTPzPXTiGR6OvFACvsBg"  }
          {"type": "server", "timestamp": "2025-08-27T15:22:12,888Z", "level": "INFO", "component": "o.e.c.m.MetadataDeleteIndexService", "cluster.name": "docker-cluster", "node.name": "6b5ccd32ebe0", "message": "[bitbucket-index-state/ugK8g-9EQm-gUEyYJGQckg] deleting index", "cluster.uuid": "CRo1q0n1R4iGJUYhfxN5QQ", "node.id": "-nfTPzPXTiGR6OvFACvsBg"  }
          {"type": "server", "timestamp": "2025-08-27T15:22:12,940Z", "level": "INFO", "component": "o.e.c.m.MetadataDeleteIndexService", "cluster.name": "docker-cluster", "node.name": "6b5ccd32ebe0", "message": "[bitbucket-search/q-LGUpSITjeEKICWso0MBA] deleting index", "cluster.uuid": "CRo1q0n1R4iGJUYhfxN5QQ", "node.id": "-nfTPzPXTiGR6OvFACvsBg"  }
          {"type": "server", "timestamp": "2025-08-27T15:22:13,079Z", "level": "INFO", "component": "o.e.c.m.MetadataCreateIndexService", "cluster.name": "docker-cluster", "node.name": "6b5ccd32ebe0", "message": "[bitbucket-repository] creating index, cause [api], templates [], shards [5]/[1]", "cluster.uuid": "CRo1q0n1R4iGJUYhfxN5QQ", "node.id": "-nfTPzPXTiGR6OvFACvsBg"  }
          {"type": "server", "timestamp": "2025-08-27T15:22:13,168Z", "level": "INFO", "component": "o.e.c.m.MetadataCreateIndexService", "cluster.name": "docker-cluster", "node.name": "6b5ccd32ebe0", "message": "[bitbucket-index-state] creating index, cause [api], templates [], shards [5]/[1]", "cluster.uuid": "CRo1q0n1R4iGJUYhfxN5QQ", "node.id": "-nfTPzPXTiGR6OvFACvsBg"  }
          {"type": "server", "timestamp": "2025-08-27T15:22:13,225Z", "level": "INFO", "component": "o.e.c.m.MetadataCreateIndexService", "cluster.name": "docker-cluster", "node.name": "6b5ccd32ebe0", "message": "[bitbucket-project] creating index, cause [api], templates [], shards [5]/[1]", "cluster.uuid": "CRo1q0n1R4iGJUYhfxN5QQ", "node.id": "-nfTPzPXTiGR6OvFACvsBg"  }
          {"type": "server", "timestamp": "2025-08-27T15:22:13,269Z", "level": "INFO", "component": "o.e.c.m.MetadataCreateIndexService", "cluster.name": "docker-cluster", "node.name": "6b5ccd32ebe0", "message": "[bitbucket-search] creating index, cause [api], templates [], shards [5]/[1]", "cluster.uuid": "CRo1q0n1R4iGJUYhfxN5QQ", "node.id": "-nfTPzPXTiGR6OvFACvsBg"  }
          

      So far the NoShardAvailableActionException will trigger this, there might be different edge cases which will trigger Bitbucket to rebuild the indexes.

      It appears that the current approach might be risky, as we are potentially deleting healthy indexes when we're unable to obtain the index version. Would it be good to consider checking the response code and opting for a less destructive alternative?

      Expected Results

      Fail soft and notify admins that the search indexes are either missing or corrupted.

      Actual Results

      Bitbucket triggers a full re-indexing once the search server throws NoShardAvailableActionException.

      Workaround

      Currently there is no known workaround for this behavior. A workaround will be added here when available

              75fca8d6bc6b Hong Huynh
              mmuthusamy Moga
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: