Uploaded image for project: 'Crucible'
  1. Crucible
  2. CRUC-6858

Binary file detection by content type should allow some unexpected values

    XMLWordPrintable

Details

    • Our product teams collect and evaluate feedback from a number of different sources. To learn more about how we use customer feedback in the planning process, check out our new feature policy.

    Description

      Fisheye/Crucible has an isTest method that attempts to guess, as a last resort, whether a file is textual or binary.

      If it can't detect that the file is various known text types by the leading bytes, it checks the value of the first 20k bytes in the file and if any are outside the range 1 - 127, it decides the file must be binary.

      This occasionally causes problems with files containing occasional values outside that range that are in ASCII-like but not ASCII formats.

      Alternatively we should delete our own attempts to detect textness and use a library for this. Some seem to exist.

      It would probably be better to be a bit more lenient, and assume that if a file's bytes are mostly in a texty looking range, then it's still better to show it as text than binary. If a file is 5% boxes then I'm probably still happy to look at it. The %age could be a system property.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              don.willis@atlassian.com Don Willis
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: