-
Suggestion
-
Resolution: Fixed
We can still improve the responsiveness of indexing:
Sulka's suggestions:
-----------------
1) For every PDF and PPT file larger than 2 MB, put the files into the indexer queue instead of immediate reindex even during a complete reindex. Make the indexer queue sort the queue by the content size so that when choosing the next file to index, the smallest file size is preferred. This allows all content which indexes quickly do so and the wait to get the index working in the first place is much, much quicker for large installations.
2) Change the reindexing report to be an "administrative portlet" in the Dashboard instead of being it's own page - if an admin loses the indexing report page for some reason it's impossible to know when the indexing is supposed to have finished. It takes me 47 minutes to reindex fully and it's awkward to have to have the one page up for the whole time to see if it's completed.
3) Don't reindex immediately during reimporting data, change the system to get up and running first and trigger the indexing after the system is in usable condition. Maybe even make the reindex a manual thing the admin has to trigger? Preferably implement above portlet for monitoring. Or maybe add all files into the queue?
Charles' notes:
------------
- The major problem is that we tokenise attachments before we put them on the queue. This means that any operation that causes an attachment to be re-indexed will still hang while we analyse it: which can take some time when re-indexing or performing an import over lots of large attachments, or even when you're attaching a large single document to a page.
- The LongRunningTask UI is sub-optimal. When you're in a long running task, you get taken to this one page with no navigation, that you can't ever get back to. We'd be better off changing the re-index so that if you go to the re-index page while one is already in progress, you see the status bar there.