Uploaded image for project: 'Bitbucket Data Center'
  1. Bitbucket Data Center
  2. BSERV-3179

Convert non-UTF8 files to UTF8 before generating the diff view

    • We collect Bitbucket feedback from various sources, and we evaluate what we've collected when planning our product roadmap. To understand how this piece of feedback will be reviewed, see our Implementation of New Features Policy.

      If the files are encoded with non-UTF8 character encoding, the diff view will have invalid character due to git diff using UTF8 encoding.

      Need to convert non-UTF8 files to UTF8 before generating the diff view.

      *NOTE: This fix was enabled as an opt-in feature. A repository admin must enable "Transcode diff" for non-UT8 files to work. See this comment for details.*

        1. diffView.jpg
          diffView.jpg
          80 kB
        2. sjis.txt
          0.1 kB
        3. sourceView.jpg
          sourceView.jpg
          84 kB

            [BSERV-3179] Convert non-UTF8 files to UTF8 before generating the diff view

            Mai Nakagawa (Inactive) added a comment - - edited

            It still is not fixed perfectly. The diff view of non-UTF8 (Shift-JIS) file garbled, while the source view of it displays correctly. I tested with the latest Stash 3.1.1

            Attached please find the Shift JIS file, captured diff view, and captured source view

            Correction. The diff view does not garble when I enabled 'transcode diffs' on the repository settings page, as bturner said on the above.

            Mai Nakagawa (Inactive) added a comment - - edited It still is not fixed perfectly. The diff view of non-UTF8 (Shift-JIS) file garbled, while the source view of it displays correctly. I tested with the latest Stash 3.1.1 Attached please find the Shift JIS file, captured diff view, and captured source view Correction. The diff view does not garble when I enabled 'transcode diffs' on the repository settings page, as bturner said on the above.

            All,

            Stash 3.1 will include support for transcoding diffs, allowing it to correctly display non-UTF-8 diffs. Any file that displays correctly in the Source view should also display correctly in commit diffs, pull request diffs and diff-to-previous.

            This feature is not without cost. git does not support encodings other than ASCII and UTF-8, so this feature is implemented using textconv, transcoding all other codepages to UTF-8 (this is the same work done by the Source view since Stash 1.0). git diff will pipe all files through Stash for transcoding prior to performing the diff. In general, this should not have a significant impact, but it's certainly not free. As a result, this feature is disabled by default. Repository administrators will need to enable "Transcode diffs" on the repository settings page. This must be done for each repository which includes non-UTF-8 files.

            There is browser-level caching on diff output to reduce server roundtrips on the commit diff view. That means if a garbled diff has been viewed and then transcoding is enabled that user will need to clear their browser cache before they will see the transcoded diff.

            Best regards,
            Bryan Turner
            Atlassian Stash

            Bryan Turner (Inactive) added a comment - All, Stash 3.1 will include support for transcoding diffs, allowing it to correctly display non-UTF-8 diffs. Any file that displays correctly in the Source view should also display correctly in commit diffs, pull request diffs and diff-to-previous. This feature is not without cost. git does not support encodings other than ASCII and UTF-8, so this feature is implemented using textconv , transcoding all other codepages to UTF-8 (this is the same work done by the Source view since Stash 1.0). git diff will pipe all files through Stash for transcoding prior to performing the diff. In general, this should not have a significant impact, but it's certainly not free. As a result, this feature is disabled by default. Repository administrators will need to enable "Transcode diffs" on the repository settings page. This must be done for each repository which includes non-UTF-8 files. There is browser-level caching on diff output to reduce server roundtrips on the commit diff view. That means if a garbled diff has been viewed and then transcoding is enabled that user will need to clear their browser cache before they will see the transcoded diff. Best regards, Bryan Turner Atlassian Stash

            Does anyone know any workaround for this problem?

            Felipe Brandão Nascimento added a comment - Does anyone know any workaround for this problem?

            We also affected by this one. Please, fix it.

            Alexey Efimov added a comment - We also affected by this one. Please, fix it.

            It's been 4 month ago when Roger Barnes posted a hopegiving comment on this issue.
            And there is no movement since then.
            SCM Manager had lot's of issues with coding, and they fixed all of them in two weeks!
            We are still evaluating both products

            Vladimir Muravlev added a comment - It's been 4 month ago when Roger Barnes posted a hopegiving comment on this issue. And there is no movement since then. SCM Manager had lot's of issues with coding, and they fixed all of them in two weeks! We are still evaluating both products

            A brief update: we have looked into this issue and one of our developers has been working on a solution. There are still some complexities to be ironed out but we are hoping to get this finalised and shipped in the coming months.

            Roger Barnes (Inactive) added a comment - A brief update: we have looked into this issue and one of our developers has been working on a solution. There are still some complexities to be ironed out but we are hoping to get this finalised and shipped in the coming months.

            Vladimir Muravlev added a comment - - edited

            This should be crearly considered as a major bug.
            Non-utf files containing program sources (cpp, pas etc) include comments on local languages in ANSI coding. In diff view all these comments are replaced with <question marks>. But these comments are most needed in diff view, when analyzing changes made by other developers. This bug makes many cool features of STASH product (for ex pull requests) completely unusable for non-english-speaking developers.

            For me the state of this issue is a decision-maker on buying Stash for our team. And it's open for a year now

            BTW can someone post here contents of the extranet page mentioned in the issue. I'm too curious, I know

            Vladimir Muravlev added a comment - - edited This should be crearly considered as a major bug. Non-utf files containing program sources (cpp, pas etc) include comments on local languages in ANSI coding. In diff view all these comments are replaced with <question marks>. But these comments are most needed in diff view, when analyzing changes made by other developers. This bug makes many cool features of STASH product (for ex pull requests) completely unusable for non-english-speaking developers. For me the state of this issue is a decision-maker on buying Stash for our team. And it's open for a year now BTW can someone post here contents of the extranet page mentioned in the issue. I'm too curious, I know

            JasonCao added a comment -

            It's not Improvement,it's a Bug.

            JasonCao added a comment - It's not Improvement,it's a Bug.

            askfor added a comment -

            Some of our projects can't use stash because of this bug.

            askfor added a comment - Some of our projects can't use stash because of this bug.

            herzog@t-systems.com_match added a comment -

            This should be no Feature request as this is a "bug by concept/design", or.... let's name it in atlassian-style: It's a fug!
            I can't imagine a scenario where it would be necessary to NOT convert non-UTF8 files to UTF for diff view...

            Also: The browse View (when viewing a non-UTF8 README.md for example) is affected...
            as well as the edit view when using the stash-editor-plugin.

            So the conversion shall be in the core.

            herzog@t-systems.com_match added a comment - This should be no Feature request as this is a "bug by concept/design", or.... let's name it in atlassian-style: It's a fug! I can't imagine a scenario where it would be necessary to NOT convert non-UTF8 files to UTF for diff view... Also: The browse View (when viewing a non-UTF8 README.md for example) is affected... as well as the edit view when using the stash-editor-plugin. So the conversion shall be in the core.

              Unassigned Unassigned
              klfoong Foong (Inactive)
              Votes:
              36 Vote for this issue
              Watchers:
              27 Start watching this issue

                Created:
                Updated:
                Resolved: