Git and normalization of line-endings
A few months ago, I spent hours trying to decide about the best way to deal with line endings and how to switch a repo to using .gitattributes. I just found those comprehensive notes, and thought they'd be easier to find here than buried in my notes...
TL;DR
- line-ending normalization is about converting
LF
<=>CR+LF
, for cross-platform compatibility -
.gitattributes
file is the safestgit
mechanism to manage line-endings normalization - updating normalization settings is tricky because
git
may report changes on unmodified files, and it is totally not obvious what is happening - there are some tricks to help understand what is happening and to fix things
Line endings and Operating systems
When you press <Enter>
in your text editor, the file is modified with invisible characters that represent the new line. This invisible thing is most commonly represented in two ways:
- ASCII character LF (aka Line Feed)
- ASCII character CR+LF (aka Carriage Return + Line Feed)
Historically, most systems used to require CR+LF, and Unix systems decided in the 1980s to remove the CR character to simplify things and save disk space.
In practice, Windows is the only modern operating systems that still uses CRLF line endings.
When developing in a team, you will end up with people working on Windows and other operating systems, and you will need to manage this difference in your source control system.
Git's core.autocrlf
This settings is defined via git config. It applies globally or per repo. When enabled, it applies normalization on all files detected by git as text
-
Checkout Windows-style, commit Unix-style (
core.autocrlf=true
)- Git will convert LF to CRLF when checking out text files.
- When committing text files, CRLF will be converted to LF.
- This is the default value pushed by the installer on Windows systems
-
Checkout as-is, commit Unix-style (
core.autocrlf=input
)- Git will not perform any conversion when checking out text files.
- When committing text files, CRLF will be converted to LF.
- some people recommend using this when developing on Unix systems
-
Checkout as-is, commit as-is (
core.autocrlf=false
)- Git will not perform any conversions when checking out or committing text files.
- This is default value if the setting is not defined.
Some people consider it is not git's responsibility to do line-ending normalization. It could be tempting to go "checkout as-is commit as-is" in order to disable git's normalization. But it cannot be commited to a repo, so it is dependent on developer workstation settings ====> fragile
Regardless of what is defined in people's local core.autocrlf
setting, individual repository maintainers can override the behavior via the .gitattributes
file, which is the most robust way to go.
.gitattributes
.gitattributes
assigns attributes to file types. The text
and eol
attributes are used to control the end-of-line normalization process
-
-text
: disable normalization for this type of file. Should be used for any type of binary file. -
text
: normalizes this type of file usingcore.eol
, which defaults to OS native (core.eol
should not be touched in normal situations) -
eol=lf
: forceslf
on all systems, regardless of OS eol=crlf
: forcescrlf
on all systems, regardless of OS-
global wildcards
-
* text=auto
- lets
git
detect file type and apply normalization accordingly - similar to setting
core.autocrlf=true
- lets
-
* -text
- people have tried to use this to emulate "checkout as-is commit as is"
- but they had various levels of success
- you can always use these wildcards in addition to more specific overrides
-
Beware, line-endings normalization must not be enabled on any binary file.
Updating the normalization settings
If you change the normalization settings (either core.autocrlf
or .gitattributes
), you will have some work to do on your local repository, on your remote repository, and on your colleagues workstations.
You can also just leave it be, but you expose yourself to weird git behaviors (untouched files reported as changed, among others) or other issues.
Your first reflex would be to look at the line endings in your code editor and play around with the different git commands you'll find online, but it can very quickly become very confusing.
View the difference between Index and Workspace
You will find quantities of "solutions"/tutorials in stackoverflow or other websites that tell you what to do, but they always miss some edge cases.
Git normalization does not happen in the workspace, but during the transition into or out of the index, so you need a way to view line-endings in both the index and the workspace before acting.
Here is the thing that should be checked to understand what is happening, and most tutorials don't talk about it:
git ls-files --eol
Which could result in this type of output:
i/lf w/crlf attr/ Applications/K8S/versions.tf
i/lf w/lf attr/text eol=lf .gitignore
i/-text w/-text attr/ Services/SMB/hosts-2022-10-20.xlsx
i/lf w/lf attr/ .gitattributes
i/crlf w/crlf attr/ Applications/K8S/ci/backend.tfvars
i/lf w/crlf attr/text eol=lf Legacy/Modules/Keyvault/.gitignore
- i/ tells you how the file is saved in the index
- w/ tells you how the file is presented in the workspace
- attr/ tells you how the .gitattributes file(s) is (are) hinting git to deal with this file
For a usual Windows developer using the core.autocrlf=true
option (which is the default pushed by git installation on Windows), you should normally mostly get a mix of the first three types:
-
i/lf w/crlf attr/
: the file is normalized by git and uses Windows standard line-endingscrlf
-
i/lf w/lf attr/text eol=lf
: the file is normalized by git and enforced to uself
-
i/-text w/-text attr/
: the file is autodetected as a binary and not normalized by git
If you ended up with a mix of any other ones, it may be because you or somebody made some changes in the .gitattributes
file or the core.autocrlf
option.
Repairing i/crlf w/crlf attr/
This file was most probably pushed by someone using core.autocrlf=false
and working in Windows. This will typically make git complain about changes on untouched files.
Fix strategies:
-
in any case,
- make sure your have a clean repo before acting
- communicate on this change, because people will encounter real conflicts
-
option1: make a commit that will fix all the files in your repo with
git add . --renormalize
- pb: will impede your capability to do a blame
- you could instruct blame to be more happy with option
-w
- in practice, git GUIs will happily workaround this
- if necessary you could also use these blame options --ignore-rev, --ignore-revs-file
-
option2: rewrite your history
- pb: rewriting history is hard.
- need to synchronize all committers
- almost impossible in opensource projects
https://www.ofcodeandcolor.com/2013/08/29/normalizing-line-endings-in-git-repositories/
https://www.moxio.com/blog/43/ignoring-bulk-change-commits-with-git-blame
Repairing i/lf w/crlf attr/text eol=lf
In this case, the index is ok, but the workspace is "broken".
This file was probably checked out before attribute eol=lf
was specified.
Git will not bother you with this. But maybe your code editor or tool will bug you if it requires crlf
line-endings for some file types.
Examples:
- visual studio may complain or introduce incoherent line-endings if csproj have "wrong" line endings
- terraform will complain if
*.lock.hcl
files have wrong line-endings
The fix : delete the local file, and check it out again
Bulk fixing: pipe the output of this command to xargs rm, then do a git reset (with all the precautions needed!!!)
git ls-files --eol | grep "i/lf w/crlf attr/t" | cut -f2 -d$'\t'
Repairing i/lf w/lf attr/text eol=crlf
In this case, the index is ok, but the workspace is "broken".
This time, it may be a problem if your tooling or IDE requires lf
line-endings.
The fix: same as the other "broken" workspace situation.
Top comments (6)
Do you have examples of cases where
* -text
didn't work properly for some people?Using
* -text
is the same as telling git to do no line-endings normalization at all. But there is a reason that git does line ending normalization in the first place. It may bite you if you just disable this feature.The git book provides the following example:
git-scm.com/book/en/v2/Customizing...
While this specific file type is supposed to be treated as binary, you can imagine that some text editors don't know about this file type. It may then ignore it, treat it as text and convert all line endings. And the normalization of end of lines done by git will prevent this unwanted change to be committed to the repository.
From what I can tell, the example for the git book would suggest to use the
binary
attribute for.pbxproj
so that thediff
attribute will be unset, but it says nothing about the risks of using* -text
. Am I missing something?This is not entirely correct. Text files (files that Git identifies as text files) are ALWAYS converted from CRLF to LF upon check-in. The core.eol option and eol attribute ONLY concern checkout. This isn't properly documented, but I recently performed exhaustive tests and looked at the Git source code.
* text=auto
in .gitattributes.Further notes:
git add --renormalize .
to ensure correct behavior. In particular for auto-detected text files (text=auto), even committing modified files may otherwise not result in the desired line ending conversion. (This somehow relates to internal Git caching and is highly non-intuitive.)Niklas, you say
Do you know:
* -text
in.gitattributes
core.autocrlf=true
in their configwhich one wins when they check out files?
The entry in
.gitattributes
takes precedence,core.autocrlf=true
only sets the default.