Uploaded image for project: 'FishEye'
  1. FishEye
  2. FE-7094

Add support for different file paths encoding for Mercurial

    • Our product teams collect and evaluate feedback from a number of different sources. To learn more about how we use customer feedback in the planning process, check out our new feature policy.

      Mercurial has no native support for different encoding of file paths. It can determine encoding of content, but file paths are not supported.

      Decoding ambiguities apply to file contents, as well as file names in the bytes-based manifest. This spec applies to the former only and does not address manifest parsing. (Tracking File Encoding in Mercurial)

      File paths are stored in binary form, thus cannot determine encoding properly. Most modern systems use UTF-8, except for Windows which use their own, incompatible with the UTF standard code pages.

      We strongly encourage our customers, to not use non-ASCII letters in file paths, because not only Fisheye will be unable to index repository but also it breaks compatibility between different systems.

      More details about possible configurations

      If Mercurial repository doesn't contain non ASCII characters in file paths, any configuration should work correctly.

      Otherwise it can cause Fisheye to be unable to index repository. Crucial piece are committers here. If they use Windows as their target platform, non UTF-8 characters can be committed causing problems.

      Possible platforms:

      • Fisheye on Windows - cannot index non ASCII file paths
      • Fisheye on Linux - can index repository, if file paths are encoded in UTF-8

      Workarounds

      I'm not affected but I want to be sure it will not happen in future

      • encourage your team to use only ASCII characters in file paths.
      • ensure all committers use OS with UTF-8 set as default encoding (modern UNIX related systems - Mac OS, Linux)

      I'm affected, what can I do now?

      • affected paths could be added to excluded paths. It will require repository reindex; after reindex excluded files will be not available in Fisheye
      • repository can be converted using hg convert; it provides ability to rename files, so non ASCII characters can be removed
        NOTE! It creates a new repository, commit hashes and other internal elements will be different, it may break links in other tools. For example, existing Crucible code reviews will point to non-existing commit hashes.

      Proposed solution

      Provide ability to choose file path encoding for Mercurial repositories, so Fisheye will be able to properly decode and encode paths.

      Note: if the option would be set at the repository level, then it requires that all Mercurial clients committing non-ascii paths has to be configured to use the same path encoding - for example all committers running Windows with the same codepage.

            [FE-7094] Add support for different file paths encoding for Mercurial

            Atlassian Update – 10 January 2022

            Hi everyone,

            We have recently reviewed this issue and the overall interest in the problem. As the issue hasn't collect votes, watchers, comments, or support cases from many customers during its lifetime, it's very low on our priority list, and will not be fixed in the foreseeable future. That's why we've decided to resolve it as Not Being Considered.

            Although we're aware the issue is still important to those of you who were involved in the conversations around it, we want to be clear in managing your expectations. The Fisheye&Crucible team is focusing on issues that have broad impact and high value, reflected by the number of comments, votes, support cases, and customers interested. Please consult the Implementation of New Features Policy for more details.

            We understand how disappointing this decision may be, but we hope you'll appreciate our transparent approach and communication. Atlassian will continue to watch this issue for further updates, so please feel free to share your thoughts in the comments.

            Kind regards
            Marek Parfianowicz
            Development Team Lead

            Marek Parfianowicz added a comment - Atlassian Update – 10 January 2022 Hi everyone, We have recently reviewed this issue and the overall interest in the problem. As the issue hasn't collect votes, watchers, comments, or support cases from many customers during its lifetime, it's very low on our priority list, and will not be fixed in the foreseeable future. That's why we've decided to resolve it as Not Being Considered . Although we're aware the issue is still important to those of you who were involved in the conversations around it, we want to be clear in managing your expectations. The Fisheye&Crucible team is focusing on issues that have broad impact and high value, reflected by the number of comments, votes, support cases, and customers interested. Please consult the Implementation of New Features Policy for more details. We understand how disappointing this decision may be, but we hope you'll appreciate our transparent approach and communication. Atlassian will continue to watch this issue for further updates, so please feel free to share your thoughts in the comments. Kind regards Marek Parfianowicz Development Team Lead

            Sune Foldager added a comment - - edited

            I have a patch to the Mercurial extension used by fisheye which fixes these issues. We have successfully used it on our forest of repositories. I will attach the patch.

            For it to work correctly (the annotate part of the patch), the hg extension needs to be always enabled. Fisheye normally only enables it for its own commands. To achieve this, put the following into the mercurial.ini of the user running fisheye (for us it was the system account, so the file is C:\Windows\System32\config\systemprofile\mercurial.ini):

            [extensions]
            felog = C:\Atlassian\fecru-4.5.3\python\hg\hgfelog\hgfelog.py
            

            With the relevant path/version adjusted.

            Edit: Ok, I can't attach files. Here is the patch in unidiff:

            diff --git a/hgfelog.py b/hgfelog.py
            --- a/hgfelog.py
            +++ b/hgfelog.py
            @@ -19,6 +19,45 @@
             
             from mercurial import hg, commands, util, mdiff, cmdutil, scmutil
             
            +import os
            +if os.name == 'nt':
            +    import locale
            +    from mercurial import encoding, extensions, dagop
            +    from mercurial.formatter import plainformatter
            +    fsencoding = locale.getpreferredencoding()
            +
            +    def enc_fs_to_native(filename):
            +        return filename.decode(fsencoding).encode(encoding.encoding)
            +
            +    def enc_native_to_fs(filename):
            +        return filename.decode(encoding.encoding).encode(fsencoding)
            +
            +    def uisetup(ui):
            +        extensions.wrapcommand(commands.table, 'annotate', feannotate)
            +
            +    def feannotate(orig, ui, repo, *pats, **opts):
            +        plainformatter_write_orig = plainformatter.write
            +
            +        def plainformatter_write(self, fields, deftext, *fielddata, **opts):
            +            try:
            +                i = fields.split().index('file')
            +                file = enc_fs_to_native(fielddata[i])
            +                fielddata = fielddata[:i] + (file,) + fielddata[i+1:]
            +            except ValueError:
            +                pass
            +            plainformatter_write_orig(self, fields, deftext, *fielddata, **opts)
            +
            +        plainformatter.write = plainformatter_write
            +        return orig(ui, repo, *pats, **opts)
            +
            +else:
            +    def enc_fs_to_native(filename):
            +        return filename
            +
            +    def enc_native_to_fs(filename):
            +        return filename
            +
            +
             try:
                     hgversion = util.version()
             except:
            @@ -143,7 +182,7 @@
                     ui.debug("discarding change to file: ", state.path(), "\n")
                     return
                 # Output some file state
            -    ui.write(format['file'] % f)
            +    ui.write(format['file'] % enc_fs_to_native(f))
                 if state.state() == MERGED or ui.verbose:
                     ui.write(format['fileRev'] % formatRev(ui, state.filerevctx()))
                 ui.write(format['fileStatus'] % state.status())
            @@ -154,7 +193,7 @@
                 def printParent(index, fkey):
                     if len(state.parents()) > index:
                         p = state.parents()[index]
            -            ui.write(format[fkey] % ( formatRev(ui, repo[p.linkrev()]), p.path()))
            +            ui.write(format[fkey] % (formatRev(ui, repo[p.linkrev()]), enc_fs_to_native(p.path())))
                 printParent(0, 'fileParent0')
                 printParent(1, 'fileParent1')
                 if diffs and state != MERGED:
            @@ -364,6 +403,8 @@
                         return pre + " 100" + s
                     def diffstr(path1, path2):
                         # Bizarrely, if one side of the diff does not exist, hg diff --git prints that it is diffing the file to itself but then the +++/--- lines use /dev/null
            +            path1 = enc_fs_to_native(path1)
            +            path2 = enc_fs_to_native(path2)
                         p1 = "a/" + path1
                         p2 = "b/" + path2
                         if path1 == "":
            @@ -379,7 +420,7 @@
                         aBinary = False
                         bFileName = self._path
                         if not bFileName in ctx:
            -                ui.debug("Can't find ", bFileName, " in commit (", ctx.rev(), ") manifest, not diffing")
            +                ui.debug("Can't find ", enc_fs_to_native(bFileName), " in commit (", ctx.rev(), ") manifest, not diffing")
                             return
                         bFileCtx = ctx[bFileName]
                         bFileData = self._content
            @@ -389,7 +430,7 @@
                     elif self._state == DELETED:
                         aFileName = self._path
                         if not aFileName in fromRev:
            -                ui.debug("Can't find ", aFileName, " in parent commit (", fromRev.rev(), ") manifest, not diffing")
            +                ui.debug("Can't find ", enc_fs_to_native(aFileName), " in parent commit (", fromRev.rev(), ") manifest, not diffing")
                             return
                         aFileCtx = fromRev[aFileName]
                         aFileData = aFileCtx.data()
            @@ -403,7 +444,7 @@
                     else:
                         aFileName = self._fromPath
                         if not aFileName in fromRev:
            -                ui.debug("Can't find ", aFileName, " in parent commit (", fromRev.rev(), ") manifest, not diffing")
            +                ui.debug("Can't find ", enc_fs_to_native(aFileName), " in parent commit (", fromRev.rev(), ") manifest, not diffing")
                             return
                         aFileCtx = fromRev[aFileName]
                         aFileData = aFileCtx.data()
            @@ -411,7 +452,7 @@
                         aBinary = util.binary(aFileData)
                         bFileName = self._path
                         if not bFileName in ctx:
            -                ui.debug("Can't find ", bFileName, " in commit (", ctx.rev(), ") manifest, not diffing")
            +                ui.debug("Can't find ", enc_fs_to_native(bFileName), " in commit (", ctx.rev(), ") manifest, not diffing")
                             return
                         bFileCtx = ctx[bFileName]
                         bFileData = self._content
            @@ -422,8 +463,8 @@
                         msg = "rename"
                         if self._state == COPIED:
                             msg = "copy"
            -            ui.write(prefix, msg, " from ", aFileName, "\n")
            -            ui.write(prefix, msg, " to ", bFileName, "\n")
            +            ui.write(prefix, msg, " from ", enc_fs_to_native(aFileName), "\n")
            +            ui.write(prefix, msg, " to ", enc_fs_to_native(bFileName), "\n")
                     if extra:
                         ui.write(prefix, extra, '\n')
                     diffopts = mdiff.diffopts()
            @@ -438,7 +479,7 @@
                         # do nothing
                         pass
                     elif aBinary or bBinary:
            -            ui.write(prefix, "Binary file ", self._path, " has changed\n")
            +            ui.write(prefix, "Binary file ", enc_fs_to_native(self._path), " has changed\n")
                     else:
                         if hgversion >= '4.2':
                             uheaders, hunks = mdiff.unidiff(aFileData, aFileDate, bFileData, bFileDate, aFileName, bFileName, diffopts)
            @@ -529,7 +570,7 @@
                         elif not samep1 and samep2:
                             fromp2.append((path, versionIn(path, p1mf)))
                 def printModifiedPath(path, replace, ctx):
            -        ui.write(format['file'] % path)
            +        ui.write(format['file'] % enc_fs_to_native(path))
                     ui.write(format['fileRev'] % formatRev(ui, repo[ctx[path].linkrev()]))
                     ui.write(format['fileStatus'] % MERGED)
                     ui.write(format['fileMergeFrom'] % formatRev(ui, ctx))
            @@ -601,7 +642,7 @@
                     rev = ctx[file].linkrev()
                     if not rev in revstrs:
                         revstrs[rev] = formatRev(ui, repo[rev])
            -        ui.write(revstrs[rev], " ", file, "\n")
            +        ui.write(revstrs[rev], " ", enc_fs_to_native(file), "\n")
             
             
             @command('fecheck',
            
            

            Sune Foldager added a comment - - edited I have a patch to the Mercurial extension used by fisheye which fixes these issues. We have successfully used it on our forest of repositories. I will attach the patch. For it to work correctly (the annotate part of the patch), the hg extension needs to be always enabled. Fisheye normally only enables it for its own commands. To achieve this, put the following into the mercurial.ini of the user running fisheye (for us it was the system account, so the file is C:\Windows\System32\config\systemprofile\mercurial.ini): [extensions] felog = C:\Atlassian\fecru-4.5.3\python\hg\hgfelog\hgfelog.py With the relevant path/version adjusted. Edit: Ok, I can't attach files. Here is the patch in unidiff: diff --git a/hgfelog.py b/hgfelog.py --- a/hgfelog.py +++ b/hgfelog.py @@ -19,6 +19,45 @@ from mercurial import hg, commands, util, mdiff, cmdutil, scmutil +import os +if os.name == 'nt': + import locale + from mercurial import encoding, extensions, dagop + from mercurial.formatter import plainformatter + fsencoding = locale.getpreferredencoding() + + def enc_fs_to_native(filename): + return filename.decode(fsencoding).encode(encoding.encoding) + + def enc_native_to_fs(filename): + return filename.decode(encoding.encoding).encode(fsencoding) + + def uisetup(ui): + extensions.wrapcommand(commands.table, 'annotate', feannotate) + + def feannotate(orig, ui, repo, *pats, **opts): + plainformatter_write_orig = plainformatter.write + + def plainformatter_write(self, fields, deftext, *fielddata, **opts): + try: + i = fields.split().index('file') + file = enc_fs_to_native(fielddata[i]) + fielddata = fielddata[:i] + (file,) + fielddata[i+1:] + except ValueError: + pass + plainformatter_write_orig(self, fields, deftext, *fielddata, **opts) + + plainformatter.write = plainformatter_write + return orig(ui, repo, *pats, **opts) + +else: + def enc_fs_to_native(filename): + return filename + + def enc_native_to_fs(filename): + return filename + + try: hgversion = util.version() except: @@ -143,7 +182,7 @@ ui.debug("discarding change to file: ", state.path(), "\n") return # Output some file state - ui.write(format['file'] % f) + ui.write(format['file'] % enc_fs_to_native(f)) if state.state() == MERGED or ui.verbose: ui.write(format['fileRev'] % formatRev(ui, state.filerevctx())) ui.write(format['fileStatus'] % state.status()) @@ -154,7 +193,7 @@ def printParent(index, fkey): if len(state.parents()) > index: p = state.parents()[index] - ui.write(format[fkey] % ( formatRev(ui, repo[p.linkrev()]), p.path())) + ui.write(format[fkey] % (formatRev(ui, repo[p.linkrev()]), enc_fs_to_native(p.path()))) printParent(0, 'fileParent0') printParent(1, 'fileParent1') if diffs and state != MERGED: @@ -364,6 +403,8 @@ return pre + " 100" + s def diffstr(path1, path2): # Bizarrely, if one side of the diff does not exist, hg diff --git prints that it is diffing the file to itself but then the +++/--- lines use /dev/null + path1 = enc_fs_to_native(path1) + path2 = enc_fs_to_native(path2) p1 = "a/" + path1 p2 = "b/" + path2 if path1 == "": @@ -379,7 +420,7 @@ aBinary = False bFileName = self._path if not bFileName in ctx: - ui.debug("Can't find ", bFileName, " in commit (", ctx.rev(), ") manifest, not diffing") + ui.debug("Can't find ", enc_fs_to_native(bFileName), " in commit (", ctx.rev(), ") manifest, not diffing") return bFileCtx = ctx[bFileName] bFileData = self._content @@ -389,7 +430,7 @@ elif self._state == DELETED: aFileName = self._path if not aFileName in fromRev: - ui.debug("Can't find ", aFileName, " in parent commit (", fromRev.rev(), ") manifest, not diffing") + ui.debug("Can't find ", enc_fs_to_native(aFileName), " in parent commit (", fromRev.rev(), ") manifest, not diffing") return aFileCtx = fromRev[aFileName] aFileData = aFileCtx.data() @@ -403,7 +444,7 @@ else: aFileName = self._fromPath if not aFileName in fromRev: - ui.debug("Can't find ", aFileName, " in parent commit (", fromRev.rev(), ") manifest, not diffing") + ui.debug("Can't find ", enc_fs_to_native(aFileName), " in parent commit (", fromRev.rev(), ") manifest, not diffing") return aFileCtx = fromRev[aFileName] aFileData = aFileCtx.data() @@ -411,7 +452,7 @@ aBinary = util.binary(aFileData) bFileName = self._path if not bFileName in ctx: - ui.debug("Can't find ", bFileName, " in commit (", ctx.rev(), ") manifest, not diffing") + ui.debug("Can't find ", enc_fs_to_native(bFileName), " in commit (", ctx.rev(), ") manifest, not diffing") return bFileCtx = ctx[bFileName] bFileData = self._content @@ -422,8 +463,8 @@ msg = "rename" if self._state == COPIED: msg = "copy" - ui.write(prefix, msg, " from ", aFileName, "\n") - ui.write(prefix, msg, " to ", bFileName, "\n") + ui.write(prefix, msg, " from ", enc_fs_to_native(aFileName), "\n") + ui.write(prefix, msg, " to ", enc_fs_to_native(bFileName), "\n") if extra: ui.write(prefix, extra, '\n') diffopts = mdiff.diffopts() @@ -438,7 +479,7 @@ # do nothing pass elif aBinary or bBinary: - ui.write(prefix, "Binary file ", self._path, " has changed\n") + ui.write(prefix, "Binary file ", enc_fs_to_native(self._path), " has changed\n") else: if hgversion >= '4.2': uheaders, hunks = mdiff.unidiff(aFileData, aFileDate, bFileData, bFileDate, aFileName, bFileName, diffopts) @@ -529,7 +570,7 @@ elif not samep1 and samep2: fromp2.append((path, versionIn(path, p1mf))) def printModifiedPath(path, replace, ctx): - ui.write(format['file'] % path) + ui.write(format['file'] % enc_fs_to_native(path)) ui.write(format['fileRev'] % formatRev(ui, repo[ctx[path].linkrev()])) ui.write(format['fileStatus'] % MERGED) ui.write(format['fileMergeFrom'] % formatRev(ui, ctx)) @@ -601,7 +642,7 @@ rev = ctx[file].linkrev() if not rev in revstrs: revstrs[rev] = formatRev(ui, repo[rev]) - ui.write(revstrs[rev], " ", file, "\n") + ui.write(revstrs[rev], " ", enc_fs_to_native(file), "\n") @command('fecheck',

            +1

              Unassigned Unassigned
              mtokarski@atlassian.com Marek Tokarski
              Votes:
              6 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: