-
Type:
Suggestion
-
Resolution: Unresolved
-
Component/s: Search - Connectors - 3P SmartLinks
-
None
-
4
Issue Summary
Indexing large Google Drive accounts in Rovo is extremely slow. For example, only 485,000 out of 200 million files were indexed in one day, meaning it could take over a year to finish. This is due to Google API rate limits, and current workarounds like filters or blocklists aren’t practical for very large datasets.
Steps to Reproduce
- Connect a Google Drive account with a very large number of files (e.g., 200 million) to Rovo using the Google Drive Connector.
- Start the initial indexing process.
Expected Results
All files should be indexed and searchable in Rovo within a reasonable timeframe (e.g., days or weeks).
- Find ways to significantly speed up the initial indexing for large Google Drive connectors.
- Possible solutions:
- Work with Google to increase API rate limits for enterprise customers.
- Use smarter indexing methods (like Drive Activity APIs or parallel processing).
- Offer better pre-indexing filters that work for very large datasets.
- Show estimated indexing time and progress in the UI to help set expectations.
Actual Results
Indexing is extremely slow. Only 485,000 files were indexed after one day, with an estimated 1.5 years required to complete indexing for 200 million files. The process is bottlenecked by Google API rate limits, and suggested solutions like filtering or blocklisting are not feasible at this scale.
Workaround
Required, if there is no workaround please state:
Currently there is no known workaround for this behavior. A workaround will be added here when available