History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: JRA-4058
Type: Improvement Improvement
Status: Open Open
Priority: Critical Critical
Assignee: Unassigned
Reporter: Nick Minutello
Votes: 2
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
JIRA

Re-indexing takes a very long time!

Created: 17/Jul/04 06:17 PM   Updated: 01/Mar/05 03:59 PM
Component/s: Filtering & Indexing
Affects Version/s: 2.6.1 Pro
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments: 1. Java Source File DefaultIndexManager.java (19 kb)

Image Attachments:

1. screenshot-1.jpg
(124 kb)
Environment:
JDK 1.4.2_04 on WinXP, 2-proc hyperthreading intel 3GHz, 2Gb RAM, 756Mb jvm heap max (only 340 used), 10,000rpm 380Mbs SCSI disks, running against SQL Server 2000. CPU utilisation on SQL Server low. Network utilisation low.

McAfee Viruscan active, but not set to scan index file-types.

Issue Links:
Duplicate
 

Participants: Jeff Turner [Atlassian], Nick Minutello, Owen Fellows and Scott Farquhar [Atlassian]
Since last comment: 176 weeks, 4 days ago
Labels:


 Description  « Hide
We have a moderate-large-size jira database:
12600 issues
18700 comments

Re-indexing takes 1515 seconds (25 minutes!)

I dont know if this is expected behaviour, but its an awfully long time. If we need to re-index during the day, we are talking 1/hr downtime.



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Owen Fellows - 18/Jul/04 05:27 AM
Have you added the recommended indexes to you SQL Server database?
http://www.atlassian.com/software/jira/docs/latest/indexing.html
Does your Issue cache contain all Issues or is its maximum set to less than the total Issues?

Jeff Turner [Atlassian] - 19/Jul/04 05:41 AM

> McAfee Viruscan active, but not set to scan index file-types.

Not sure if it's relevant, but we have a report from another user that McAfee's NetShield 4.5 claims to let one exclude folders from scanning, but doesn't actually do so. Upgrading to 7.0 apparently fixes this.


Nick Minutello - 19/Jul/04 01:45 PM
With database indexing, it now takes 17 seconds.
Its an improvement, but still very long...

Nick Minutello - 19/Jul/04 01:46 PM
I dont think we are using McAfee's NetShield 4.5.

Are we sure its nothing to do with the way indexing is done now (it now uses less files)


Nick Minutello - 20/Jul/04 12:47 PM
Did I say 17 seconds?

I meant 17 minutes.


Nick Minutello - 21/Oct/04 06:18 AM
Currently takes about 25 minutes.

We have about 17500 issues
We have 26000 comments

Issue cache size is 10000

Server is running on linux (so no virusscan issues)

Need to work out why this is taking so long....


Jeff Turner [Atlassian] - 22/Oct/04 04:22 AM
It's fairly likely that there isn't much optimisation left to be done. What do you find are the typical reasons for needing to reindex?

Jeff Turner [Atlassian] - 25/Oct/04 03:34 AM
Incidentally, having just waited an hour for the Apache JIRA to reindex during an upgrade, I agree it's a real problem whatever the cause. Perhaps we need a way to make JIRA read-only, so that reindexes/upgrades can be done 'offline' and then made live in one quick operation.

Nick Minutello - 25/Oct/04 08:05 AM
I think there is plenty of optimisation to be had here.

a) its practically 100% IO-bound
b) its running in only 1 thread

What you want to do is

a) get Doug Lea's concurrency Library
b) have a number of worker pools loading issues, loading comments, indexing issues, indexing comments.

My suggestion:
Have 4 worker pools (see note # below)

Issue Loading Worker Pool
Comment Loading Worker Pool
Issue Indexing Worker Pool
Comment Indexing Worker Pool

  • main thread dispatches issue-batch-load jobs (load 100 at a time) to Issue Loading Worker Pool
  • The workers in Issue Loading Worker Pool loads, say, 100 issues and sends issue-indexing jobs to Issue Indexing Worker Pool - and sends comment-loading jobs to Comment Loading Worker Pool
  • Issue Indexing Worker Pool ... takes an issue and indexes the bugger
  • Comment Loading Worker Pool loads comments for a given issue and sends comment-indexing jobs to the Comment Indexing Worker Pool
  • Comment Indexing Worker Pool ... takes a comment and indexes the bugger.

(#) Really speaking, you can fold the 2 loaders into one worker pool and the two indexers into another. The primary thing is to give the database access and the lucene indexing their own thread pools so one task doesnt starve the other of threads.

Given the io-bound nature of the reindexing, you can probably tune it up to a very high number of threads.

-Nick


Nick Minutello - 28/Feb/05 11:38 AM
Using the threadpool executor, its acutally even easier than my description above.

Roughly speaking it looks like this:

PooledExecutor executor = new PooledExecutor(new LinkedQueue, 10);
executor.createThreads(10)

foreach issue: {

executor.execute(new Runnable() {
... index issue
});

executor.execute(new Runnable() { ... load comments ... index comments }); }
executor.shutdownAfterProcessingCurrentlyQueuedTasks();
executor.awaitTerminationAfterShutdown();


Scott Farquhar [Atlassian] - 28/Feb/05 12:00 PM
Nick champ - can you send me any source code that you have regarding this? I'll see what I can do to integrate it.

Nick Minutello - 01/Mar/05 03:59 PM
Here is a rough cut. Not much testing done...