Uploaded image for project: 'Atlassian Intelligence'
  1. Atlassian Intelligence
  2. AI-582

Allow search for words and phrases with non-letter symbols: plus (+), minus (-), period (.), dollar sign ($), asterisk (*), etc.

    • 17
    • Our product teams collect and evaluate feedback from a number of different sources. To learn more about how we use customer feedback in the planning process, check out our new feature policy.

      NOTE: This suggestion is for Confluence Cloud. Using Confluence Server? See the corresponding suggestion.

      At the moment, searching for "hello-to-the-world" in Confluence always returns the same results as "hello to the world". This situation also applies for symbols like plus, underscore, period, dollar sign, percent sign, and so on.

      There's also no way to prevent asterisks being treated as wildcard characters in Lucene, so you can't search for a word like "plea" and match content with asterisks around the word.

      Words are also not split on dots, so you can't search for "somefile" and find pages that contains "somefile.txt" or "somefile.doc" in the text.

      Technical notes

      This is due to how Confluence's search tokenises search requests. It splits the query up into words based on letter characters, and ignores all symbols in the request. We use Lucene's StandardTokenizer in our EnglishAnalyzer, and similar implementations for other languages.

      Here is the description of the behaviour of StandardTokenizer from Lucene:

      • Splits words at punctuation characters, removing punctuation. However, a dot that's not followed by whitespace is considered part of a token.
      • Splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a product number and is not split.
      • Recognizes email addresses and internet hostnames as one token.

      An example of the grammar for this tokenizer can be viewed here: StandardTokenizerImpl.jflex.

        1. search-not-working-1.png
          search-not-working-1.png
          38 kB
        2. search-not-working-2.png
          search-not-working-2.png
          18 kB
        3. search-not-working-3.png
          search-not-working-3.png
          18 kB

            [AI-582] Allow search for words and phrases with non-letter symbols: plus (+), minus (-), period (.), dollar sign ($), asterisk (*), etc.

            How is this still a thing? There really needs to be an "exact word" option that does not turn a dash into a space.
            This has to be the first search I've ever seen that does this.

            Search "config-name", searches for the strings "config" and "name". Across 100 repos this is absolutely useless.

            +1 if it's an consolation.

            Jacob Vandevelde added a comment - How is this still a thing? There really needs to be an "exact word" option that does not turn a dash into a space. This has to be the first search I've ever seen that does this. Search "config-name", searches for the strings "config" and "name". Across 100 repos this is absolutely useless. +1 if it's an consolation.

            +1

            Hi everyone. 
            My name is Sheng An Zhang and I am a product manager on the search team. 

            Our team is looking to focus on fixing fundamental issues with our search. This bug is something my team is now looking into and will be an area of higher priority for us. While I cannot provide you with a concrete timeline, we will be actively looking to fix this and related issues.

            As we are beginning to explore this problem, we are also looking for your help in ensuring we build the right thing for you. If you are willing to participate in a short customer interview to chat about this (or search in general), please find a time that suits you here.

            **Otherwise, please do not hesitate to shoot me an email at szhang4@atlassian.com.

            We really want to thank you for all your patience and feedback!

            Sheng An Zhang (Inactive) added a comment - Hi everyone.  My name is Sheng An Zhang and I am a product manager on the search team.  Our team is looking to focus on fixing fundamental issues with our search. This bug is something my team is now looking into and will be an area of higher priority for us. While I cannot provide you with a concrete timeline, we will be actively looking to fix this and related issues. As we are beginning to explore this problem, we are also looking for your help in ensuring we build the right thing for you. If you are willing to participate in a short customer interview to chat about this (or search in general), please find a time that suits you  here . **Otherwise, please do not hesitate to shoot me an email at szhang4@atlassian.com. We really want to thank you for all your patience and feedback!

            sxander added a comment -

            It would be extremely helpful to be able to search for special characters.  Especially in a heavily data bound world.  Finding x_y as opposed to x y or xy would be exceptionally useful and highly time saving.

            sxander added a comment - It would be extremely helpful to be able to search for special characters.  Especially in a heavily data bound world.  Finding x_y as opposed to x y or xy would be exceptionally useful and highly time saving.

            PÃ¥l F. Kristiansen added a comment - https://support.atlassian.com/confluence-cloud/docs/confluence-search-syntax/

            The search engine is useless if it is not possible to search exact text/phrase.

            PÃ¥l F. Kristiansen added a comment - The search engine is useless if it is not possible to search exact text/phrase.

            We recently changed the names of some of our Bitbucket repos, many of which have a dash in them. We want to be able to update all of our our Confluence pages that reference these repos, which would be a simple affair if Confluence would recognize dashes in search terms. But since it doesn't, updating the repo names is going to be tedious.

            Scott Moore added a comment - We recently changed the names of some of our Bitbucket repos, many of which have a dash in them. We want to be able to update all of our our Confluence pages that reference these repos, which would be a simple affair if Confluence would recognize dashes in search terms. But since it doesn't, updating the repo names is going to be tedious.

            Searching for command line options in code references it also impossible with this issue.

            "--some-option" will return results for "some" and "option" but never together.

            Luan Minh Nguyen added a comment - Searching for command line options in code references it also impossible with this issue. "--some-option" will return results for "some" and "option" but never together.

            We do automated deployments in our area, every one of them has an identifier the identifier is made up of 4 parts divided by '.' when I search for this key provider instead of my search looking like this
            part1.part2.part3.part4 (which brings up nothing)
            I have to search for
            text ~(part1) AND text ~(part2) AND text ~(part3) AND text ~(part4)
            I would really like the search to work so I don't have to work through this unexpected behaviour without a monolithic work around.

            Anthony Brown added a comment - We do automated deployments in our area, every one of them has an identifier the identifier is made up of 4 parts divided by '.' when I search for this key provider instead of my search looking like this part1.part2.part3.part4 (which brings up nothing) I have to search for text ~(part1) AND text ~(part2) AND text ~(part3) AND text ~(part4) I would really like the search to work so I don't have to work through this unexpected behaviour without a monolithic work around.

            Matt Shepherd added a comment - - edited

            This is not fixed for underscores as noted above, and agree with the other posters- this is insanely bad and needs to be worked. We are on version 5.9.4.

            Matt Shepherd added a comment - - edited This is not fixed for underscores as noted above, and agree with the other posters- this is insanely bad and needs to be worked. We are on version 5.9.4.

            This isn't a feature request, it's a fundamental problem. We have many business terms which use hyphens, for example "multi-leg" which are key to the searches our users execute.

            This needs to be fixed please.

            Matthew Harris added a comment - This isn't a feature request, it's a fundamental problem. We have many business terms which use hyphens, for example "multi-leg" which are key to the searches our users execute. This needs to be fixed please.

            dingetje NA added a comment - - edited

            The following screenshots show that search isn't working as expected, even with the wildcard regex expression nor when the minus sign is escaped:



            dingetje NA added a comment - - edited The following screenshots show that search isn't working as expected, even with the wildcard regex expression nor when the minus sign is escaped:

            We need to be able to search through our articles placed in Confluence. It's critical for us to have an option to index the part of code. For instance: when you have something like this "verify(api, times(1)).addOrUpdate" on the page, then you cannot find string "addOrUpdate" - it is not indexed.

            There is a workaround for doing wildcard searching at the beginning. If you structure your search like this ....
            /.hum./
            ... then you will get results for hum/human/inhumane/thumbprints/etc. The slashes tell the search engine to do a regular expressions search as opposed to a normal search. The .* before and after the search term tells the regex engine that you want any character zero or more times (in essence a wildcard).

            Deleted Account (Inactive) added a comment - We need to be able to search through our articles placed in Confluence. It's critical for us to have an option to index the part of code. For instance: when you have something like this "verify(api, times(1)).addOrUpdate" on the page, then you cannot find string "addOrUpdate" - it is not indexed. There is a workaround for doing wildcard searching at the beginning. If you structure your search like this .... /. hum. / ... then you will get results for hum/human/inhumane/thumbprints/etc. The slashes tell the search engine to do a regular expressions search as opposed to a normal search. The .* before and after the search term tells the regex engine that you want any character zero or more times (in essence a wildcard).

            dingetje NA added a comment - - edited

            Even escaping the special characters in Advanced Search does not yield the correct results. I'm stupefied that something as crucial as a search is not working as expected.
            Moreover, this crucial issue is now over 10 years old! How is that even possible?

            What I don't understand that this is marked "suggestion" and nobody is assigned to it, even though there are 71 up votes.

            dingetje NA added a comment - - edited Even escaping the special characters in Advanced Search does not yield the correct results. I'm stupefied that something as crucial as a search is not working as expected. Moreover, this crucial issue is now over 10 years old! How is that even possible? What I don't understand that this is marked "suggestion" and nobody is assigned to it, even though there are 71 up votes.

            Removing underscore character from issue summary (as this issue has been resolved for underscores as of Confluence 5.2).

            dave (Inactive) added a comment - Removing underscore character from issue summary (as this issue has been resolved for underscores as of Confluence 5.2).

            Underscore characters are no longer treated as word boundary characters as of Confluence 5.2. This means that searching for hello_world will find documents containing the term hello_world intact.

            This was because we switched to using the UAX29URLEmailAnalyzer from StandardAnalyzer.

            dave (Inactive) added a comment - Underscore characters are no longer treated as word boundary characters as of Confluence 5.2. This means that searching for hello_world will find documents containing the term hello_world intact. This was because we switched to using the UAX29URLEmailAnalyzer from StandardAnalyzer .

            Seems to still be an issue with 5.5, searching for "on-call" for example only finds pages with "call" in them and is ignoring on or on- all together. on-call should be considered one word.

            Justin Willoughby added a comment - Seems to still be an issue with 5.5, searching for "on-call" for example only finds pages with "call" in them and is ignoring on or on- all together. on-call should be considered one word.

            David Yu added a comment -

            Is this fixed in 5.2+ ? I noticed underscores, plus signs, and things now show up in search results.

            Try searching JIRA_HOME:
            https://confluence.atlassian.com/dosearchsite.action?queryString=jira_home

            David Yu added a comment - Is this fixed in 5.2+ ? I noticed underscores, plus signs, and things now show up in search results. Try searching JIRA_HOME: https://confluence.atlassian.com/dosearchsite.action?queryString=jira_home

            I don't know why Justin says this has been a problem for FIVE years ... when this ticket was filed EIGHT years ago.

            When I filed my comment on Dec. 21, 2010 ... then it was FIVE years old. (Such memories. )

            Fred Bunting added a comment - I don't know why Justin says this has been a problem for FIVE years ... when this ticket was filed EIGHT years ago. When I filed my comment on Dec. 21, 2010 ... then it was FIVE years old. (Such memories. )

            Justin Ellis added a comment - - edited

            It is ridiculous that you cannot search for specific strings containing underscores. I understand this is a function of the indexing engine, but this has been a problem for FIVE years now. Since many naming conventions use underscores, it seems like a no-brainer that one should be able to search for underscores in Confluence (as well as in JIRA, et al.)

            How does this still not have an assignee?

            Justin Ellis added a comment - - edited It is ridiculous that you cannot search for specific strings containing underscores. I understand this is a function of the indexing engine, but this has been a problem for FIVE years now. Since many naming conventions use underscores, it seems like a no-brainer that one should be able to search for underscores in Confluence (as well as in JIRA, et al.) How does this still not have an assignee?

            Please, please fix this. It makes Confluence a far less useful system for technical knowledge.

            Mark Williams added a comment - Please, please fix this. It makes Confluence a far less useful system for technical knowledge.

            Please fix this. I run into this issue every day... all our knowledge at our company is stored in confluence and I can't find anything with an underscore in it. Please help...

            Jason Kotenko added a comment - Please fix this. I run into this issue every day... all our knowledge at our company is stored in confluence and I can't find anything with an underscore in it. Please help...

            John Olah added a comment -

            I liked the approach that you used for focusing on the editor in Confluence 4.0. You need to take the same approach with search and make it the entire focus of a release and address issues like this one. This is a huge negative for the product.

            John Olah added a comment - I liked the approach that you used for focusing on the editor in Confluence 4.0. You need to take the same approach with search and make it the entire focus of a release and address issues like this one. This is a huge negative for the product.

            Couldn't agree more.
            The company I work for chose Confluence as a replacement for our old wiki.
            The reason? Search engine limitations. Please feel free to chuckle at this point.

            Much to my disappointment, Lucene makes searching for any technical information very difficult.
            In our company, our database table and stored procedure names all contain underscores. All. As in every last one of them.
            This means that all database tables and entity references from the old wiki content that has been migrated to Confluence cannot be found.

            It makes a very poor impression that a problem with such a high impact on Confluence users has not been fixed in so many years.

            The effect of this is that people are very reluctant to use the wiki, as they have to jump through hoops to make their content searchable (by hacking together labels for table names with the underscores removed).

            The Confluence wiki has been an enormous failure and it's almost entirely the fault of this bug. I agree with previous comments. This is a bug. Not a new feature.

            Reynard Claassen added a comment - Couldn't agree more. The company I work for chose Confluence as a replacement for our old wiki. The reason? Search engine limitations. Please feel free to chuckle at this point. Much to my disappointment, Lucene makes searching for any technical information very difficult. In our company, our database table and stored procedure names all contain underscores. All. As in every last one of them. This means that all database tables and entity references from the old wiki content that has been migrated to Confluence cannot be found. It makes a very poor impression that a problem with such a high impact on Confluence users has not been fixed in so many years. The effect of this is that people are very reluctant to use the wiki, as they have to jump through hoops to make their content searchable (by hacking together labels for table names with the underscores removed). The Confluence wiki has been an enormous failure and it's almost entirely the fault of this bug. I agree with previous comments. This is a bug. Not a new feature.

            I cant understand, why this is still open.
            the confluence search is BROKEN.
            you wont find search terms with non-letter symbols in it.

            Christian Michallek added a comment - I cant understand, why this is still open. the confluence search is BROKEN. you wont find search terms with non-letter symbols in it.

            Dolby Enterprise Apps added a comment - - edited

            Hey!!! Let's fix this as a hyphenated compound word can be a valid search term as it is valid grammer. For example, I shoudl be able to get exact results on words like "editor-in-chief" or "merry-go-round" or "long-term". The search should return exact matches when a word like this is searched for

            PLEASE FIX!

            thx

            Dolby Enterprise Apps added a comment - - edited Hey!!! Let's fix this as a hyphenated compound word can be a valid search term as it is valid grammer. For example, I shoudl be able to get exact results on words like "editor-in-chief" or "merry-go-round" or "long-term". The search should return exact matches when a word like this is searched for PLEASE FIX! thx

            This bug is a major hassle as well described above.

            All I can do is beg for someone to PLEASE FIX THIS ANNOYING BEHAVIOR.

            The whole reason we are able to use a wiki is that the search works better than the horrible IBM Document Management (sic) system that we are being dragged kicking and screaming towards by the powers that golf... a bug like this is like fingernails being pried from the edge of the abysss... Aaaaaaaaaaarrrrrgggggggghhhhhhhhh!

            Jim Pinkham added a comment - This bug is a major hassle as well described above. All I can do is beg for someone to PLEASE FIX THIS ANNOYING BEHAVIOR. The whole reason we are able to use a wiki is that the search works better than the horrible IBM Document Management (sic) system that we are being dragged kicking and screaming towards by the powers that golf... a bug like this is like fingernails being pried from the edge of the abysss... Aaaaaaaaaaarrrrrgggggggghhhhhhhhh!

            Yeah - this is a pretty key feature.

            Moreover, I cannot force a literal search by quoting text with a - in it.

            Compound words are a pretty common grammatically: e.g. "nineteenth-century", "low-budget". If I quote my search terms, it really should find my specific hyphenated words. Right now, the search results just will not focus down correctly, e.g. "re-sign" becomes just a search for sign.

            How am I supposed to search my large confluence sight of documentation or acronyms that contain hyphens ... please please please fix this or at least support forcing a literal search for words that contain dashes with something like a quoted string entered into the search field. Please?

            Dolby Enterprise Apps added a comment - Yeah - this is a pretty key feature. Moreover, I cannot force a literal search by quoting text with a - in it. Compound words are a pretty common grammatically: e.g. "nineteenth-century", "low-budget". If I quote my search terms, it really should find my specific hyphenated words. Right now, the search results just will not focus down correctly, e.g. "re-sign" becomes just a search for sign . How am I supposed to search my large confluence sight of documentation or acronyms that contain hyphens ... please please please fix this or at least support forcing a literal search for words that contain dashes with something like a quoted string entered into the search field. Please?

            I was surprised to see this issue in the Confluence search, still not resolved 6 years later. 100% agree with Fred that it's a critical issue for software companies that use Confluence for technical documentation.

            Dmitry Samosseiko added a comment - I was surprised to see this issue in the Confluence search, still not resolved 6 years later. 100% agree with Fred that it's a critical issue for software companies that use Confluence for technical documentation.

            Sorry for the delay:

            We have schedule this for internal review.
            This ticket has been flagged and will be estimating the complexity in getting this implemented.

            Vincent Choy (Inactive) added a comment - Sorry for the delay: We have schedule this for internal review. This ticket has been flagged and will be estimating the complexity in getting this implemented.

            (It's now been one year since my last comment ... time to update it.)

            Please consider fixing this now 5-year-old problem. The inability to search for tokens with underscores is a major problem for those of us in any software-related industry.

            Fred Bunting added a comment - (It's now been one year since my last comment ... time to update it.) Please consider fixing this now 5-year-old problem. The inability to search for tokens with underscores is a major problem for those of us in any software-related industry.

            this is not a feature its a bug, plain and easy.
            And its a bug that cause real problems!
            The search results are modified and not correct.

            example:
            i save the url test-it-out.com on some wiki pages.
            now i search for test.
            The Search result will show me:
            testitout.com

            this is completely wrong!!!
            So whats a wiki worth, were you cant be sure that the search results are correct?
            searching is the essence of a wiki, and when this is flawed, then the product is broken!

            Christian Michallek added a comment - this is not a feature its a bug, plain and easy. And its a bug that cause real problems! The search results are modified and not correct. example: i save the url test-it-out.com on some wiki pages. now i search for test. The Search result will show me: testitout.com this is completely wrong!!! So whats a wiki worth, were you cant be sure that the search results are correct? searching is the essence of a wiki, and when this is flawed, then the product is broken!

            Linking to CONF-14554.

            Please consider fixing this now 4-year-old problem. The inability to search for tokens with underscores is a major problem for those of us in any software-related industry.

            Fred Bunting added a comment - Linking to CONF-14554 . Please consider fixing this now 4-year-old problem. The inability to search for tokens with underscores is a major problem for those of us in any software-related industry.

            I think that

            • dash / hyphen should be considered as a separator and
            • underscore as a part of the word.

            That means "fur_connect" is one word, whereas "fur-connect" are two words. This is a behavior common in many search engines and we rely on it in composing filenames, since it is not possible to search for "*connect" breaking the filename in separate names gives us higher chance to find something.

            Petr Kohutek added a comment - I think that dash / hyphen should be considered as a separator and underscore as a part of the word . That means "fur_connect" is one word, whereas "fur-connect" are two words. This is a behavior common in many search engines and we rely on it in composing filenames, since it is not possible to search for "*connect" breaking the filename in separate names gives us higher chance to find something.

            When you search for "fur_connect", you get all pages with fur and connect, the underscore is not recognized. This is bad behavior.

            Kerry Geiger added a comment - When you search for "fur_connect", you get all pages with fur and connect, the underscore is not recognized. This is bad behavior.

            jens added a comment -

            Thank you for the feature request. we will discuss it internally and might prioritize it, depending on the votes for this issue.

            cheers,

            Jens

            jens added a comment - Thank you for the feature request. we will discuss it internally and might prioritize it, depending on the votes for this issue. cheers, Jens

              Unassigned Unassigned
              4d3096c80b53 Roberto Fdez.
              Votes:
              132 Vote for this issue
              Watchers:
              94 Start watching this issue

                Created:
                Updated: