Binary file detection by content type should allow some unexpected values

XMLWordPrintable

      Fisheye/Crucible has an isTest method that attempts to guess, as a last resort, whether a file is textual or binary.

      If it can't detect that the file is various known text types by the leading bytes, it checks the value of the first 20k bytes in the file and if any are outside the range 1 - 127, it decides the file must be binary.

      This occasionally causes problems with files containing occasional values outside that range that are in ASCII-like but not ASCII formats.

      Alternatively we should delete our own attempts to detect textness and use a library for this. Some seem to exist.

      It would probably be better to be a bit more lenient, and assume that if a file's bytes are mostly in a texty looking range, then it's still better to show it as text than binary. If a file is 5% boxes then I'm probably still happy to look at it. The %age could be a system property.

            Assignee:
            Unassigned
            Reporter:
            Don Willis
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: