-
Suggestion
-
Resolution: Answered
-
None
-
Although we are testing in Confluence 3.4.8, this potentially affect other versions too.
NOTE: This suggestion is for Confluence Server. Using Confluence Cloud? See the corresponding suggestion.
This feature request is related to support ticket https://support.atlassian.com/browse/CSP-58347 After contacting Atlassian Support on behalf of our client for this issue, I was told Lucene index customization is currently not supported and was asked to submit a feature request instead.
Business Use Case
Our client wishes to use Confluence as a partner portal, to delegate spaces to their partners. This seems like a very common user case, and leveraging one of the core space distribution benefits of Confluence spaces.
However, since their partners are OEM manufacturers, and possibly competitors, it is logical that they do not want the names of the users to be visible to each other. We have suppressed the Profile Directory without problem, but since user names are indexed by Lucene, removing them from the index is not working (despite our code efforts to do so, as described below).
Abstract
We are in the process of implementing a user requirement to exclude all user information from Confluence search. The approach we took is removing Personal Information from the Lucene index using an extractor module.
We've found from logging that the document appears to be removed from the index, but the search results persist. Worse, having updated a user's profile, and reindexed, we have index locking errors amongst other things.
Detail
The Index Limiter plugin has been written for the single purpose of removing personal information from the Lucene index.
The purpose of this is to remove:
- username links from the rich text editor (RTE)
- username results from "quicksearch"
- user details/profile from the search results page e.g. /dosearchsite.action?queryString=admin
I've attempted the unindexing in 2 parts
- Invalidating the fields: Using an extractor module (source in svn) to invalidate the values in these fields: "type","email", "fullName", "title", "username" within the Lucene documents of type PersonalInformation.CONTENT_TYPE – The addFields() method in the extractor
- Remove all personal information from the Lucene index: Remove all Lucene Documents with handle startswith com.atlassian.confluence.user.PersonalInformation – The unIndex() method in the extractor
Results
After installing the Index Limiter plugin, run a complete reindex in Confluence Admin to trigger removal of the personalInformation data from Lucene
1. Invalidating the fields
Takes a Lucene Document like this:
Document<
stored/uncompressed,indexed<handle:com.atlassian.confluence.user.PersonalInformation-393217>
stored/uncompressed,indexed,tokenized<content-name-unstemmed:admin>
stored/uncompressed,indexed,tokenized<email:admin@example.com>
stored/uncompressed,indexed,tokenized<fullName:admin>
stored/uncompressed,indexed,tokenized<labelText:>
stored/uncompressed,indexed,tokenized<title:admin>
stored/uncompressed,indexed,tokenized<username:admin>
stored/uncompressed,indexed<created:0fl6inapf>
stored/uncompressed,indexed<fullNameUntokenized:admin>
stored/uncompressed,indexed<hasPersonalSpace:false>
stored/uncompressed,indexed<modified:000000000>
stored/uncompressed,indexed<urlPath:/~admin>
stored/uncompressed<content-version:1>
stored/uncompressed<excerpt:>
stored/uncompressed<version:1>
>
Changes it to this:
Document<
stored/uncompressed,indexed<handle:com.atlassian.confluence.user.PersonalInformation-393217>
stored/uncompressed,indexed,tokenized<content-name-unstemmed:admin>
stored/uncompressed,indexed,tokenized<email:admin@example.com>
stored/uncompressed,indexed,tokenized<email:appfusions.invalidate>
stored/uncompressed,indexed,tokenized<fullName:admin>
stored/uncompressed,indexed,tokenized<fullName:appfusions.invalidate>
stored/uncompressed,indexed,tokenized<labelText:>
stored/uncompressed,indexed,tokenized<title:admin>
stored/uncompressed,indexed,tokenized<title:appfusions.invalidate>
stored/uncompressed,indexed,tokenized<username:admin>
stored/uncompressed,indexed,tokenized<username:appfusions.invalidate>
stored/uncompressed,indexed<created:0fl6inapf>
stored/uncompressed,indexed<fullNameUntokenized:admin>
stored/uncompressed,indexed<hasPersonalSpace:false>
stored/uncompressed,indexed<modified:000000000>
stored/uncompressed,indexed<type:appfusions.invalidate>
stored/uncompressed,indexed<urlPath:/>
stored/uncompressed<content-version:1>
stored/uncompressed<excerpt:>
stored/uncompressed<version:1>
>
Uses the following code:
document.removeField(field); // Set an invalid/meaningless value document.add(new Field(field, "appfusions.invalidate", Field.Store.YES, Field.Index.TOKENIZED));
It should change the value of each field to appfusions.invalidate, but actually adds a duplicate field with this value.
In any case, it has the desired effect on the index by removing user details from the RTE & quicksearch...
Removes user details from the RTE & quicksearch
Only partially removes information from the search results page
|| Original || Updated ||
2. Remove all personal information from the Lucene index
Inject com.atlassian.bonnie.ILuceneConnection into the extractor module with property injection & call the unIndex() method (at the bottom of this page) from the addFields() method
Having attempted to do this, logging suggests that documents have been removed, but search results suggest otherwise.
Updated User Profiles
Having updated a user profile & reindexed, further problems occur with search index locking...
2011-03-03 11:06:01,019 ERROR [DefaultQuartzScheduler_Worker-9] [atlassian.bonnie.search.BaseDocumentBuilder] getDocument Error extracting search fields from userinfo: admin v.2 (393217) using BackwardsCompatibleExtractor wrapping com.appfusions.confluence.plugins.indexlimiter.extractor.PersonalInformationExtractor@5342836a (com.appfusions.confluence.plugins.indexlimiter:PersonalInformationExtractor): org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@/Users/david/projects/appfusions/confluence/plugins/indexlimiter/trunk/target/confluence/home/index/write.lock com.atlassian.bonnie.LuceneException: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@/Users/david/projects/appfusions/confluence/plugins/indexlimiter/trunk/target/confluence/home/index/write.lock at com.atlassian.bonnie.LuceneConnection.withReaderAndDeletes(LuceneConnection.java:302) at com.appfusions.confluence.plugins.indexlimiter.extractor.PersonalInformationExtractor.unIndex(PersonalInformationExtractor.java:95) at com.appfusions.confluence.plugins.indexlimiter.extractor.PersonalInformationExtractor.addFields(PersonalInformationExtractor.java:85) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:45) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:361) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:161) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:128) at sun.reflect.GeneratedMethodAccessor337.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy35.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:29) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:199) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@/Users/david/projects/appfusions/confluence/plugins/indexlimiter/trunk/target/confluence/home/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:70) at org.apache.lucene.index.IndexReader.acquireWriteLock(IndexReader.java:638) at org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:672) at com.appfusions.confluence.plugins.indexlimiter.extractor.PersonalInformationExtractor$1.perform(PersonalInformationExtractor.java:109) at com.atlassian.bonnie.LuceneConnection.withReaderAndDeletes(LuceneConnection.java:298) ... 30 more
Supporting code
atlassian-plugin.xml:
<atlassian-plugin key="${project.groupId}.${project.artifactId}" name="${project.name}"> <plugin-info> <description>${project.description}</description> <version>${project.version}</version> <vendor name="${project.organization.name}" url="${project.organization.url}" /> </plugin-info> <extractor name="Personal Information Extractor" key="PersonalInformationExtractor" class="com.appfusions.confluence.plugins.indexlimiter.extractor.PersonalInformationExtractor" priority="900"> <description>Removes some personal information from the search index.</description> </extractor> </atlassian-plugin>
com.appfusions.confluence.plugins.indexlimiter.extractor.PersonalInformationExtractor:
package com.appfusions.confluence.plugins.indexlimiter.extractor; import org.apache.log4j.Logger; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.Term; import org.slf4j.MDC; import com.atlassian.bonnie.Searchable; import com.atlassian.bonnie.ILuceneConnection; import com.atlassian.bonnie.search.Extractor; import com.atlassian.bonnie.search.BaseDocumentBuilder; import com.atlassian.bonnie.search.DocumentBuilder; import com.atlassian.confluence.core.ContentEntityObject; import com.atlassian.confluence.user.PersonalInformation; import com.atlassian.confluence.user.UserAccessor; import java.io.IOException; /** * User: david * Date: Feb 25, 2011 * Time: 7:51:57 PM */ public class PersonalInformationExtractor implements Extractor { private UserAccessor userAccessor; private ILuceneConnection luceneConnection; private DocumentBuilder documentBuilder; public void setUserAccessor(UserAccessor userAccessor) { this.userAccessor = userAccessor; } /** * @param luceneConnection set by dependency injection, required */ public void setLuceneConnection(ILuceneConnection luceneConnection) { this.luceneConnection = luceneConnection; } public void setDocumentBuilder(DocumentBuilder documentBuilder) { this.documentBuilder = documentBuilder; } /** * Initially replace the contents of the fields in the index * This approach will remove PersonalInformation from quicksearch and the rich text editor */ public void addFields(Document document, StringBuffer defaultSearchableText, Searchable searchable) { if (searchable instanceof PersonalInformation) { PersonalInformation personalInformation = (PersonalInformation) searchable; if(userAccessor.getUser(personalInformation.getUsername()) != null) { // Most important is to change the type field to an unknown value (to Confluence) String[] fieldsTokenized = {"email", "fullName", "title", "username"}; // tokenized fields for (String field : fieldsTokenized) { document.removeField(field); // Set an invalid/meaningless value document.add(new Field(field, "appfusions.invalidate", Field.Store.YES, Field.Index.TOKENIZED)); } String[] fieldsUntokenized = {"type"}; // untokenized fields for (String field : fieldsUntokenized) { document.removeField(field); // Set an invalid/meaningless value document.add(new Field(field, "appfusions.invalidate", Field.Store.YES, Field.Index.UN_TOKENIZED)); } // Redirect/rewrite the urlPath to the context root // -- if we can't remove this item from search results, at least redirect. document.removeField("urlPath"); document.add(new Field("urlPath", "/", Field.Store.YES, Field.Index.UN_TOKENIZED)); // Finally, attempt to remove all documents related to PersonalInformation unIndex(); // unIndex(personalInformation); } } } /** * Find *all* Lucene Documents where "handle" starts with "com.atlassian.confluence.user.PersonalInformation" * - likely to be rather heavy handed, so perhaps later target just the single document in the index */ public void unIndex() { luceneConnection.withReaderAndDeletes(new ILuceneConnection.ReaderAction() { public Object perform(IndexReader indexReader) throws IOException { int max = indexReader.maxDoc(); for (int i = 0; i < max; i++) { Field handle = indexReader.document(i).getField("handle"); if (handle != null) { if (handle.stringValue().startsWith("com.atlassian.confluence.user.PersonalInformation")) { System.out.println(" unindexing "+indexReader.document(i).toString()); indexReader.deleteDocument(i); } } } return null; } }); } }
- relates to
-
AI-817 Unable to remove Personal Information from Lucene index
- Closed
Group-based support is now available and also have sold this to a number of customers - all pleased that the problem for them is fixed.
Group-support was the #1 feature request/fix from the initial release, so great to have that in.
Documentation on the V2/Group support is here. http://www.appfusions.com/display/PPARTS/Documentation#Documentation-What'sNewinVersion2?
We will have an updated video for this out soon. For eval, contact us at info@appfusions.com
We will continue to improve this and thank you for the customer support to date. Atlassian customers rule!