DEV Community

Kevin Shu
Kevin Shu

Posted on

Git and normalization of line-endings

Git and normalization of line-endings

A few months ago, I spent hours trying to decide about the best way to deal with line endings and how to switch a repo to using .gitattributes. I just found those comprehensive notes, and thought they'd be easier to find here than buried in my notes...

TL;DR

  • line-ending normalization is about converting LF <=> CR+LF, for cross-platform compatibility
  • .gitattributes file is the safest git mechanism to manage line-endings normalization
  • updating normalization settings is tricky because git may report changes on unmodified files, and it is totally not obvious what is happening
  • there are some tricks to help understand what is happening and to fix things

Line endings and Operating systems

When you press <Enter> in your text editor, the file is modified with invisible characters that represent the new line. This invisible thing is most commonly represented in two ways:

  • ASCII character LF (aka Line Feed)
  • ASCII character CR+LF (aka Carriage Return + Line Feed)

Historically, most systems used to require CR+LF, and Unix systems decided in the 1980s to remove the CR character to simplify things and save disk space.

In practice, Windows is the only modern operating systems that still uses CRLF line endings.

When developing in a team, you will end up with people working on Windows and other operating systems, and you will need to manage this difference in your source control system.

Git's core.autocrlf

This settings is defined via git config. It applies globally or per repo. When enabled, it applies normalization on all files detected by git as text

  1. Checkout Windows-style, commit Unix-style (core.autocrlf=true)

    • Git will convert LF to CRLF when checking out text files.
    • When committing text files, CRLF will be converted to LF.
    • This is the default value pushed by the installer on Windows systems
  2. Checkout as-is, commit Unix-style (core.autocrlf=input)

    • Git will not perform any conversion when checking out text files.
    • When committing text files, CRLF will be converted to LF.
    • some people recommend using this when developing on Unix systems
  3. Checkout as-is, commit as-is (core.autocrlf=false)

    • Git will not perform any conversions when checking out or committing text files.
    • This is default value if the setting is not defined.

Some people consider it is not git's responsibility to do line-ending normalization. It could be tempting to go "checkout as-is commit as-is" in order to disable git's normalization. But it cannot be commited to a repo, so it is dependent on developer workstation settings ====> fragile

Regardless of what is defined in people's local core.autocrlf setting, individual repository maintainers can override the behavior via the .gitattributes file, which is the most robust way to go.

.gitattributes

.gitattributes assigns attributes to file types. The text and eol attributes are used to control the end-of-line normalization process

  • -text : disable normalization for this type of file. Should be used for any type of binary file.
  • text : normalizes this type of file using core.eol, which defaults to OS native (core.eol should not be touched in normal situations)
  • eol=lf : forces lf on all systems, regardless of OS
  • eol=crlf : forces crlf on all systems, regardless of OS

  • global wildcards

    • * text=auto
      • lets git detect file type and apply normalization accordingly
      • similar to setting core.autocrlf=true
    • * -text
      • people have tried to use this to emulate "checkout as-is commit as is"
      • but they had various levels of success
    • you can always use these wildcards in addition to more specific overrides

Beware, line-endings normalization must not be enabled on any binary file.

Updating the normalization settings

If you change the normalization settings (either core.autocrlf or .gitattributes), you will have some work to do on your local repository, on your remote repository, and on your colleagues workstations.

You can also just leave it be, but you expose yourself to weird git behaviors (untouched files reported as changed, among others) or other issues.

Your first reflex would be to look at the line endings in your code editor and play around with the different git commands you'll find online, but it can very quickly become very confusing.

View the difference between Index and Workspace

You will find quantities of "solutions"/tutorials in stackoverflow or other websites that tell you what to do, but they always miss some edge cases.

Git normalization does not happen in the workspace, but during the transition into or out of the index, so you need a way to view line-endings in both the index and the workspace before acting.

Here is the thing that should be checked to understand what is happening, and most tutorials don't talk about it:

git ls-files --eol
Enter fullscreen mode Exit fullscreen mode

Which could result in this type of output:

i/lf    w/crlf  attr/                   Applications/K8S/versions.tf
i/lf    w/lf    attr/text eol=lf        .gitignore
i/-text w/-text attr/                   Services/SMB/hosts-2022-10-20.xlsx
i/lf    w/lf    attr/                   .gitattributes
i/crlf  w/crlf  attr/                   Applications/K8S/ci/backend.tfvars
i/lf    w/crlf  attr/text eol=lf        Legacy/Modules/Keyvault/.gitignore
Enter fullscreen mode Exit fullscreen mode
  • i/ tells you how the file is saved in the index
  • w/ tells you how the file is presented in the workspace
  • attr/ tells you how the .gitattributes file(s) is (are) hinting git to deal with this file

For a usual Windows developer using the core.autocrlf=true option (which is the default pushed by git installation on Windows), you should normally mostly get a mix of the first three types:

  • i/lf w/crlf attr/ : the file is normalized by git and uses Windows standard line-endings crlf
  • i/lf w/lf attr/text eol=lf : the file is normalized by git and enforced to use lf
  • i/-text w/-text attr/ : the file is autodetected as a binary and not normalized by git

If you ended up with a mix of any other ones, it may be because you or somebody made some changes in the .gitattributes file or the core.autocrlf option.

Repairing i/crlf w/crlf attr/

This file was most probably pushed by someone using core.autocrlf=false and working in Windows. This will typically make git complain about changes on untouched files.

Fix strategies:

  • in any case,

    • make sure your have a clean repo before acting
    • communicate on this change, because people will encounter real conflicts
  • option1: make a commit that will fix all the files in your repo with git add . --renormalize

    • pb: will impede your capability to do a blame
    • you could instruct blame to be more happy with option -w
    • in practice, git GUIs will happily workaround this
    • if necessary you could also use these blame options --ignore-rev, --ignore-revs-file
  • option2: rewrite your history

    • pb: rewriting history is hard.
    • need to synchronize all committers
    • almost impossible in opensource projects

https://www.ofcodeandcolor.com/2013/08/29/normalizing-line-endings-in-git-repositories/
https://www.moxio.com/blog/43/ignoring-bulk-change-commits-with-git-blame

Repairing i/lf w/crlf attr/text eol=lf

In this case, the index is ok, but the workspace is "broken".

This file was probably checked out before attribute eol=lf was specified.
Git will not bother you with this. But maybe your code editor or tool will bug you if it requires crlf line-endings for some file types.

Examples:

  • visual studio may complain or introduce incoherent line-endings if csproj have "wrong" line endings
  • terraform will complain if *.lock.hcl files have wrong line-endings

The fix : delete the local file, and check it out again

Bulk fixing: pipe the output of this command to xargs rm, then do a git reset (with all the precautions needed!!!)

git ls-files --eol | grep "i/lf    w/crlf  attr/t" | cut -f2 -d$'\t'
Enter fullscreen mode Exit fullscreen mode

Repairing i/lf w/lf attr/text eol=crlf

In this case, the index is ok, but the workspace is "broken".

This time, it may be a problem if your tooling or IDE requires lf line-endings.

The fix: same as the other "broken" workspace situation.

Top comments (4)

Collapse
 
decimalturn profile image
Martin Leduc

* -text

  • people have tried to use this to emulate "checkout as-is commit as is"
  • but they had various levels of success

Do you have examples of cases where * -text didn't work properly for some people?

Collapse
 
kevinshu profile image
Kevin Shu

Using * -text is the same as telling git to do no line-endings normalization at all. But there is a reason that git does line ending normalization in the first place. It may bite you if you just disable this feature.

The git book provides the following example:

For instance, Xcode projects on macOS contain a file that ends in .pbxproj, which is basically a JSON (plain-text JavaScript data format) dataset written out to disk by the IDE, which records your build settings and so on. Although it’s technically a text file (because it’s all UTF-8), you don’t want to treat it as such because it’s really a lightweight database – you can’t merge the contents if two people change it, and diffs generally aren’t helpful. The file is meant to be consumed by a machine. In essence, you want to treat it like a binary file.

git-scm.com/book/en/v2/Customizing...

While this specific file type is supposed to be treated as binary, you can imagine that some text editors don't know about this file type. It may then ignore it, treat it as text and convert all line endings. And the normalization of end of lines done by git will prevent this unwanted change to be committed to the repository.

Collapse
 
decimalturn profile image
Martin Leduc

From what I can tell, the example for the git book would suggest to use the binary attribute for .pbxproj so that the diff attribute will be unset, but it says nothing about the risks of using * -text. Am I missing something?

Collapse
 
matthies profile image
Niklas Matthies • Edited

Git will not perform any conversions when checking out or committing text files.

This is not entirely correct. Text files (files that Git identifies as text files) are ALWAYS converted from CRLF to LF upon check-in. The core.eol option and eol attribute ONLY concern checkout. This isn't properly documented, but I recently performed exhaustive tests and looked at the Git source code.

  • Core.autocrlf=true/input causes text=auto as a default, equivalent to * text=auto in .gitattributes.
  • Core.autocrlf=true effectively sets core.eol=crlf.
  • Core.autocrlf=input effectively sets core.eol=lf.
  • The above are the only effects of core.autocrlf.
  • The effective core.eol serves as a default for the eol attribute for text files.
  • The effective eol attribute of a text file determines whether it is converted from LF to CRLF on checkout. The eol value makes no difference for check-in.

Further notes:

  • Text files with mixed LF/CRLF line endings are handled differently on check-in vs. checkout. As noted above, check-in always performs conversion to LF, which for text files with mixed line endings means that they are normalized to LF upon check-in. Checkout, on the other hand, always leaves text files with mixed line endings as-is, and does not convert them to CRLF even when it otherwise would for LF-only files. (The Git source code explicitly checks for the mixed line endings case.)
  • Since Git supports CRLF to LF conversion only on check-in (and does it unconditionally for text files) and LF to CRLF conversion only on checkout (and also doesn't support conversion of text files with mixed line endings for checkout, see above), the only workable repository line ending format for text files is LF.
  • After modifying .gitattributes, one should always perform git add --renormalize . to ensure correct behavior. In particular for auto-detected text files (text=auto), even committing modified files may otherwise not result in the desired line ending conversion. (This somehow relates to internal Git caching and is highly non-intuitive.)
  • It's likely advisable to set merge.renormalize=true, unless you use non-idempotent custom smudge/clean filters that would lead to unwanted merge results with this setting.