Uploaded image for project: 'FishEye'
  1. FishEye
  2. FE-7094

Add support for different file paths encoding for Mercurial

    XMLWordPrintable

    Details

    • Type: Suggestion
    • Status: Gathering Interest (View Workflow)
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: Indexing
    • Labels:
      None
    • Feedback Policy:

      Our product teams collect and evaluate feedback from a number of different sources. To learn more about how we use customer feedback in the planning process, check out our new feature policy.

      Description

      Mercurial has no native support for different encoding of file paths. It can determine encoding of content, but file paths are not supported.

      Decoding ambiguities apply to file contents, as well as file names in the bytes-based manifest. This spec applies to the former only and does not address manifest parsing. (Tracking File Encoding in Mercurial)

      File paths are stored in binary form, thus cannot determine encoding properly. Most modern systems use UTF-8, except for Windows which use their own, incompatible with the UTF standard code pages.

      We strongly encourage our customers, to not use non-ASCII letters in file paths, because not only Fisheye will be unable to index repository but also it breaks compatibility between different systems.

      More details about possible configurations

      If Mercurial repository doesn't contain non ASCII characters in file paths, any configuration should work correctly.

      Otherwise it can cause Fisheye to be unable to index repository. Crucial piece are committers here. If they use Windows as their target platform, non UTF-8 characters can be committed causing problems.

      Possible platforms:

      • Fisheye on Windows - cannot index non ASCII file paths
      • Fisheye on Linux - can index repository, if file paths are encoded in UTF-8

      Workarounds

      I'm not affected but I want to be sure it will not happen in future

      • encourage your team to use only ASCII characters in file paths.
      • ensure all committers use OS with UTF-8 set as default encoding (modern UNIX related systems - Mac OS, Linux)

      I'm affected, what can I do now?

      • affected paths could be added to excluded paths. It will require repository reindex; after reindex excluded files will be not available in Fisheye
      • repository can be converted using hg convert; it provides ability to rename files, so non ASCII characters can be removed
        NOTE! It creates a new repository, commit hashes and other internal elements will be different, it may break links in other tools. For example, existing Crucible code reviews will point to non-existing commit hashes.

      Proposed solution

      Provide ability to choose file path encoding for Mercurial repositories, so Fisheye will be able to properly decode and encode paths.

      Note: if the option would be set at the repository level, then it requires that all Mercurial clients committing non-ascii paths has to be configured to use the same path encoding - for example all committers running Windows with the same codepage.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              mtokarski@atlassian.com Marek Tokarski
              Votes:
              4 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated: