Uploaded image for project: 'Confluence'
  1. Confluence
  2. CONF-11285

Page names with special characters to generate regular URL

    Details

    • Type: Improvement
    • Status: Open
    • Resolution: Unresolved
    • Affects Version/s: 2.3.1
    • Fix Version/s: None
    • Component/s: Usability, WIKI / HTML
    • Labels:
      None
    • Environment:
    • Last commented by user?:
      true

      Description

      Hi,

      When including different characters in page names - like %, &, +, etc., the URL is generated differently.

      A usual URL (including the page name) looks like this: http://www.gigaspaces.com/wiki/display/GS6/Welcome+to+GigaSpaces

      BUT if you add a character in the page name (for example, naming a page, C++), the URL would look like this: http://www.gigaspaces.com/wiki/pages/viewpage.action?pageId=36864161

      which is less nice-looking to say the least, and also then all URLs in the wiki aren't standard.

      Is it possible to change this?

      If it's been fixed in a future version, I'll be glad to know.

      Thanks,

      Limor

        Issue Links

          Activity

          Hide
          atlassian.com1 KiwiSpace Foundation added a comment - - edited

          The lack of this support has been bugging me a while, so after seeing yet another comment to the bug pop-up again – I tried to figure out a workaround. It may not be pretty, but it seems to work - and hopefully can be improved on.

          PART ONE: PATH-INFO STYLE PAGE URLS

          My first pass at the problem, was to get rid of the CGI-looking url
          www.yourdomain.com/pages/viewpage.action?pageId=12345
          and simply replace it with
          www.yourdomain.com/view/12345/

          This was easily done by using Apache RewriteRules:

          # Redirect ugly confluence url to friendly path-info style url
          # e.g. /view/<pageId>/
          RewriteCond	%{QUERY_STRING} 		pageId=(\d+)
          RewriteRule     ^/pages/viewpage.action$	/view/%1/? [R] [T]
          
          # Then, for requests to '/view/<pageID>/, 
          # do an invisible sub-lookup to get the page content
          RewriteRule     ^/view/(\d+)/			/pages/viewpage.action?pageId=$1 [PT]
          

          PART TWO: IMPROVED URL WITH TITLE-LOOKUP

          I was really looking for a way to add the page title on to the end of the url (ignored, but there for seo-friendliness).

          As Confluence doesn't pass the page title in the viewpage.action? url, I had to effectively do a sub-request to get this:

          Apache Config:

          # Route via PHP script to lookup page title
          RewriteCond	%{QUERY_STRING} 		pageId=(\d+)
          RewriteCond	%{REMOTE_ADDR}			!^(IP-THE-PHP-REQUEST-COMES-FROM)$
          RewriteRule     ^/pages/viewpage.action$	/path-to/friendlyurl.php?pageId=%1 [PT] [NS]
          
          # Route requests to the 'friendly prefix' used for 'numeric' page lookups to Confluence
          # again. I'm using the prefix : /view/
          RewriteRule     ^/view/(\d+)/			/pages/viewpage.action?pageId=$1 [PT]
          

          PHP Script (To lookup page titles) - "friendlyurl.php":

          <?php
           $LOGIN_TITLE = "Log In - YOUR-ORG-NAME-HERE-";  # Title of your confluence 'login' page
           $SERVER_BASE = "http://your-domain-name/";   # url of your confluence server
           $REDIRECT_PREFIX = "/view";  # Your choice of friendly prefix for 'numeric' pages
          
           # ensure we only present a number to the lookup
           $_REQUEST["pageId"] += 0;
          
           $myURL = $SERVER_BASE . "/pages/viewpage.action?pageId=" . $_REQUEST["pageId"];
          
           if ((preg_match(
             '/<title>(.+)<\/title>/',
             file_get_contents($myURL),$matches) )
            && (isset($matches[1]) ) ) {
              $title = $matches[1];
           } else {
              $title = "Not Found";
           }
          
          
           # If title is that of Confluence's login page, it's password protected,
           # so the script can't safely work out the actual page title...
           if ($title == $LOGIN_TITLE) { 
            $title = "";
           }
           
           # Redirect to 'friendly' prefix, adding seo-friendly title on end
           header("Location: " . 
            $REDIRECT_PREFIX . "/" . $_REQUEST["pageId"] . "/" . generateUrlSlug($title) );
          }   
          
          
          function generateUrlSlug($string, $maxlen=0)
          {
           $string = trim(preg_replace('/[^a-z0-9]+/', '-', strtolower($string)), '-');
           if ($maxlen && strlen($string) > $maxlen) {
            $string = substr($string, 0, $maxlen);
            $pos = strrpos($string, '-');
            if ($pos > 0) {
             $string = substr($string, 0, $pos);
            }
           }
           return $string;
          }
          ?>
          

          As I said, this seems to work – you end up with urls that look like this:
          http://www.yourdomain.com/view/2733036/welcome-to-nz-sofia-author-your-org-name
          Which is certainly better looking than Confluence's attempt.

          Caveats:

          • I'm not 100% sure the PHP script is safe, security-wise – so please get your techies to review. I don't think it can be abused to access any secure info though.
          • Looking up a 'numeric' page, adds an extra lookup-request, so will take longer to load. But other items on the page seem to work as expected and shouldn't have that overhead
          • There may some issues with relative images, scripts, etc – which expect the page contents to have the old url. I've not encountered any issues with this so far though.
          • You can modify the PHP script to create a more customised URL based on your locale

          Enjoy

          Show
          atlassian.com1 KiwiSpace Foundation added a comment - - edited The lack of this support has been bugging me a while, so after seeing yet another comment to the bug pop-up again – I tried to figure out a workaround. It may not be pretty, but it seems to work - and hopefully can be improved on. PART ONE: PATH-INFO STYLE PAGE URLS My first pass at the problem, was to get rid of the CGI-looking url www.yourdomain.com/pages/viewpage.action?pageId=12345 and simply replace it with www.yourdomain.com/view/12345/ This was easily done by using Apache RewriteRules: # Redirect ugly confluence url to friendly path-info style url # e.g. /view/<pageId>/ RewriteCond %{QUERY_STRING} pageId=(\d+) RewriteRule ^/pages/viewpage.action$ /view/%1/? [R] [T] # Then, for requests to '/view/<pageID>/, # do an invisible sub-lookup to get the page content RewriteRule ^/view/(\d+)/ /pages/viewpage.action?pageId=$1 [PT] PART TWO: IMPROVED URL WITH TITLE-LOOKUP I was really looking for a way to add the page title on to the end of the url (ignored, but there for seo-friendliness). As Confluence doesn't pass the page title in the viewpage.action? url, I had to effectively do a sub-request to get this: Apache Config: # Route via PHP script to lookup page title RewriteCond %{QUERY_STRING} pageId=(\d+) RewriteCond %{REMOTE_ADDR} !^(IP-THE-PHP-REQUEST-COMES-FROM)$ RewriteRule ^/pages/viewpage.action$ /path-to/friendlyurl.php?pageId=%1 [PT] [NS] # Route requests to the 'friendly prefix' used for 'numeric' page lookups to Confluence # again. I'm using the prefix : /view/ RewriteRule ^/view/(\d+)/ /pages/viewpage.action?pageId=$1 [PT] PHP Script (To lookup page titles) - "friendlyurl.php": <?php $LOGIN_TITLE = "Log In - YOUR-ORG-NAME-HERE-" ; # Title of your confluence 'login' page $SERVER_BASE = "http: //your-domain-name/" ; # url of your confluence server $REDIRECT_PREFIX = "/view" ; # Your choice of friendly prefix for 'numeric' pages # ensure we only present a number to the lookup $_REQUEST[ "pageId" ] += 0; $myURL = $SERVER_BASE . "/pages/viewpage.action?pageId=" . $_REQUEST[ "pageId" ]; if ((preg_match( '/<title>(.+)<\/title>/', file_get_contents($myURL),$matches) ) && (isset($matches[1]) ) ) { $title = $matches[1]; } else { $title = "Not Found" ; } # If title is that of Confluence's login page, it's password protected , # so the script can't safely work out the actual page title... if ($title == $LOGIN_TITLE) { $title = ""; } # Redirect to 'friendly' prefix, adding seo-friendly title on end header( "Location: " . $REDIRECT_PREFIX . "/" . $_REQUEST[ "pageId" ] . "/" . generateUrlSlug($title) ); } function generateUrlSlug($string, $maxlen=0) { $string = trim(preg_replace('/[^a-z0-9]+/', '-', strtolower($string)), '-'); if ($maxlen && strlen($string) > $maxlen) { $string = substr($string, 0, $maxlen); $pos = strrpos($string, '-'); if ($pos > 0) { $string = substr($string, 0, $pos); } } return $string; } ?> As I said, this seems to work – you end up with urls that look like this: http://www.yourdomain.com/view/2733036/welcome-to-nz-sofia-author-your-org-name Which is certainly better looking than Confluence's attempt. Caveats: I'm not 100% sure the PHP script is safe, security-wise – so please get your techies to review. I don't think it can be abused to access any secure info though. Looking up a 'numeric' page, adds an extra lookup-request, so will take longer to load. But other items on the page seem to work as expected and shouldn't have that overhead There may some issues with relative images, scripts, etc – which expect the page contents to have the old url. I've not encountered any issues with this so far though. You can modify the PHP script to create a more customised URL based on your locale Enjoy
          Hide
          macandron Jonas Lindström added a comment -

          Awesome! Thanks KiwiSpace Foundation

          Show
          macandron Jonas Lindström added a comment - Awesome! Thanks KiwiSpace Foundation
          Hide
          bruno.borghi Bruno Borghi added a comment -

          Working in French language, using Conluence as a wiki and a non-Atlassian scrumboard, we use frequently links to the wiki pages. It is very confusing that some links use the page title syntax and others use the viewpage syntax. It is rather SEO-unfriendly.
          However, I would suggest to use the short URL rather than the viewpage syntax : the URL is shorter, which is rather convenient.

          Show
          bruno.borghi Bruno Borghi added a comment - Working in French language, using Conluence as a wiki and a non-Atlassian scrumboard, we use frequently links to the wiki pages. It is very confusing that some links use the page title syntax and others use the viewpage syntax. It is rather SEO-unfriendly. However, I would suggest to use the short URL rather than the viewpage syntax : the URL is shorter, which is rather convenient.
          Hide
          basler@geograt.de Matthias Basler added a comment -

          For us there is one aspept which brings this issue beyond just being irritating:

          We have external software linking to confluence pages (and headings within a page). If the space is exported and later re-imported (e.g. on a different server or after a harddisk failure), then all page IDs have changed and thus all links from external software to pages using the viewpage.action?pageId=36864161 syntax do not work any more. Only the links with the page titel are stable beyond an export and re.import.

          P.S. Do you know: You can seemingly access every page by the http://<server>/display/SPACEKEY/Page+Title syntax, it is just that JIRA doesn't tell you this address. You have to guess it.

          Show
          basler@geograt.de Matthias Basler added a comment - For us there is one aspept which brings this issue beyond just being irritating: We have external software linking to confluence pages (and headings within a page). If the space is exported and later re-imported (e.g. on a different server or after a harddisk failure), then all page IDs have changed and thus all links from external software to pages using the viewpage.action?pageId=36864161 syntax do not work any more. Only the links with the page titel are stable beyond an export and re.import. P.S. Do you know: You can seemingly access every page by the http://<server>/display/SPACEKEY/Page+Title syntax, it is just that JIRA doesn't tell you this address. You have to guess it.
          Hide
          saso.zagoranski Saso Zagoranski added a comment - - edited

          Any chance someone could unsubscribe me from this thread?

          We stopped using Confluence a long time ago but unsubscribing from Jira issues seems to be even more difficult than someone from Atlassian to finally acknowledge this annoying bug!

          Show
          saso.zagoranski Saso Zagoranski added a comment - - edited Any chance someone could unsubscribe me from this thread? We stopped using Confluence a long time ago but unsubscribing from Jira issues seems to be even more difficult than someone from Atlassian to finally acknowledge this annoying bug!

            Dates

            • Created:
              Updated:
              Last commented:
              25 weeks, 2 days ago