Uploaded image for project: 'Confluence Data Center'
  1. Confluence Data Center
  2. CONFSERVER-3036

Allow search for words and phrases with non-letter symbols: plus (+), minus (-), period (.), dollar sign ($), asterisk (*), etc.

    XMLWordPrintable

Details

    • 8
    • 25
    • We collect Confluence feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

    Description

      NOTE: This suggestion is for Confluence Server. Using Confluence Cloud? See the corresponding suggestion.

      At the moment, searching for "hello-to-the-world" in Confluence always returns the same results as "hello to the world". This situation also applies for symbols like plus, underscore, period, dollar sign, percent sign, and so on.

      There's also no way to prevent asterisks being treated as wildcard characters in Lucene, so you can't search for a word like "plea" and match content with asterisks around the word.

      Words are also not split on dots, so you can't search for "somefile" and find pages that contains "somefile.txt" or "somefile.doc" in the text.

      Technical notes

      This is due to how Confluence's search tokenises search requests. It splits the query up into words based on letter characters, and ignores all symbols in the request. We use Lucene's StandardTokenizer in our EnglishAnalyzer, and similar implementations for other languages.

      Here is the description of the behaviour of StandardTokenizer from Lucene:

      • Splits words at punctuation characters, removing punctuation. However, a dot that's not followed by whitespace is considered part of a token.
      • Splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a product number and is not split.
      • Recognizes email addresses and internet hostnames as one token.

      An example of the grammar for this tokenizer can be viewed here: StandardTokenizerImpl.jflex.

      Attachments

        1. search-not-working-1.png
          search-not-working-1.png
          38 kB
        2. search-not-working-2.png
          search-not-working-2.png
          18 kB
        3. search-not-working-3.png
          search-not-working-3.png
          18 kB

        Issue Links

          Activity

            People

              Unassigned Unassigned
              4d3096c80b53 Roberto Fdez.
              Votes:
              218 Vote for this issue
              Watchers:
              134 Start watching this issue

              Dates

                Created:
                Updated: