Description
We should filter the paths to content index for SVN (and p4) as we do here: https://extranet.atlassian.com/crucible/browse/~br=2.7-FECRU-1957-content-dupes/FE-hg/src/java/com/cenqua/fisheye/svn/SvnRepositoryScanner.java?r=70669da392559c65059162bf73a7e19850f03286#to187
by calling a sub-class filter content paths method.
This means for matching paths we will lookup the (String) paths 3 times instead of 2, but for non-matching paths, we'll only look them up once and we won't try to delete all their content docs, triggering unnecessarily small lucene commits.
We should modify the SortedIntSet inplace, so we don't use any more memory