As I was looking for easy assignments for the Open Source Development Course I found something very troubling which is also an opportunity for a lot of teaching and a lot of practice.
Some files don't need to be in git
The common sense dictates that we rarely need to include generated files in our git repository. There is no point in keeping them in our version control as they can be generated again. (The exception might be if the generation takes a lot of time or can be done only during certain phases of the moon.)
Neither is there a need to store 3rd party libraries in our git repository. Instead of that we store a list of our dependencies with the required version and then we download and install them. (Well, the rightfully paranoid might download and save a copy of every 3rd party library they use to ensure it can never disappear, but you'll see we are not talking about that).
.gitignore
The way to make sure that neither we nor anyone else adds these files to the git repository by mistake is to create a file called .gitignore
, include patterns that match the files we would like to exclude from git and add the .gitignore
file to our repository. git will ignore those file. They won't even show up when you run git status
.
The format of the .gitignore
file is described in the documentation of .gitignore.
In a nutshell:
/output.txt
Ignore the output.txt
file in the root of the project.
output.txt
Ignore output.txt
anywhere in the project. (in the root or any subdirectory)
*.txt
Ignore all the files with .txt
extension
venv
Ignore the venv
folder anywhere in the project.
There are more. Check the documentation of .gitignore!
Not knowing about .gitignore
Apparently a lot of people using git and GitHub don't know about .gitignore
The evidence:
Python developers use something called virtualenv
to make it easy to use different dependencies in different projects. When they create a virtualenv
they usually configure it to install all the 3rd party libraries in a folder called venv
. This folder we should not include in git. And yet:
There are 452M hits for this search venv
In a similar way NodeJS developers install their dependencies in a folder called node_modules
. There are 2B responses for this search: node_modules
Finally, if you use the Finder
applications on macOS and open a folder, it will create an empty(!) file called .DS_Store
. This file is really not needed anywhere. And yet I saw many copies of it on GitHub. Unfortunately so far I could not figure out how to search for them. The closest I found is this search.
Misunderstanding .gitignore
There are also many people who misunderstand the way .gitignore works. I can understand it as the wording of the explanation is a bit ambiguous. What we usually say is that
If you'd like to make sure that git will ignore the
__pycache__
folder then you need to put it in.gitignore
.
A better way would be to say this:
If you'd like to make sure that git will ignore the
__pycache__
folder then you need to put its name in the.gitignore
file.
Without that people might end up creating a folder called .gitignore
and moving all the __pycache__
folder to this .gitignore
folder. You can see it in this search
Help
Can you suggest other common cases of unnecessary files in git that should be ignored?
Can you help me creating the search for .DS_store
in GitHub?
Updates
More based on the comments:
-
.o
files the result of compilation of C and C++ code: .o -
.class
files the result of compilation of Java code: .class -
.pyc
files are compiled Python code. Usually stored in the__pycache__
folder mentioned earlier: .pyc
How to create a .gitignore file?
A follow-up post:
Oldest comments (50)
Does GitHub really store duplicate files?
I don't know how github stores the files, but I am primarily interested in the health of each individual project. Having these files stored and then probably updated later will cause misunderstandings and it will make harder to track changes.
Duplicate or not, git clone is create them. 😞
I am not sure I understand what you meant by this comment.
It doesn't matter if github stores it in duplicate or not, because git clone will create it unnecessarily on the client side.
Right
He demonstrates how to lighten your open source projects with the use of
.gitignore
. 👍🏼At no time does he point at people and tell them that. Why do you think like that? 🤔
I am sorry, but it is unclear what you mean by that comment and what does that image refer to? Could you elaborate, please?
Good explanation of .gitgnore
Don’t forget those .env files as well!
GitHub’s extension search parameter doesn’t require the dot, so your .DS_Store search should work if you make that small change
extension:DS_Store
https://github.com/search?q=extension%3ADS_Store&type=Code
Did you know the command
git clean -Xfd
will remove all files from your project that match the current contents of your .gitignore file? I love this trick.Lifesaver!
I was just looking for this comman I wanted to remove some of the sqlite files from GitHub
This won't remove the already committed files from github. It removes the files from your local disk that should not be committed to git.
Be careful with this one. Some of my repos have bits and pieces I expressly never commit and are in .gitignore but also don't want to branch/stash those things either. Things like files with sensitive configuration information or credentials in them that exist for development/testing purposes but should never reach GitHub.
Maybe use environment variables in your IDE? Or if you're on Linux, you can set those values automatically when you enter the folder with
cd
. This is much safer in both situations, you will never commit this data and will never delete it.For e.g. syncing it between devices, use password manager (like BitWarden).
The thing with repos is that git clean -Xfd should not be dangerous, if it is then you have important information that should be stored elsewhere, NOT on the filesystem.
Please learn to use a proper pgpagent or something.
The filesystem should really be ephemeral.
Information has to be stored somewhere. And that means everything winds up stored in a file system somewhere.
Ignored files are usually build artifacts and machine generated files that can be derived from your repository source or should otherwise not be committed. Some common examples are: dependency caches, such as the contents of /node_modules or /packages. compiled code, such as .o , .pyc , and .class files.
I've updated the article based on your suggestions. Thanks.
Game engine projects often have very large cache folders that contain auto generated files which should not be checked into repositories. There are well established .gitignore files to help keep these out of GitHub, but people all to often don't use them.
For Unity projects, "Library" off the root is a cache (hard to search for that one, it's too generic).
For Unreal, "DerivedDataCache" is another (search link)
There's also visual studio's debug symbol files with extension .pdb. these can get pretty damn big and often show up in repos when they shouldn't: search link
Thanks! That actually gave me the idea to open the recommended gitignore files and use those as the criteria for searches.
See also gitignore generators like gitignore.io. For example, this generated .gitignore has some interesting ones like
*.log
and*.tmp
.Good explanation!
Could GitHub inform users about it?
It could or there could be a bot doing that, however there might be some legitimate cases when you want to store these files. (I don't have an example now, but I would not rule that out.)
Also when you create a new repository github offers you to add a .gitignore file based on the files here

I somewhat disagree with this take, checking in files like node_modules ensures that everyone is installing the same exact dependencies.
tools like Yarn even encourages this which is why they came up with Yarn PnP where package archives are stored in your project directory
Good point, however I think among those the only really important one is the last one: To ensure that external decision (e.g. removal of a dependency from npm or removal of a specific version of a dependency from npm) won't impact you.
For that, IMHO, it would be a better idea to create a local copy of a subsection of npm - all the files the project or the company needs. However I don't know if there is an easy solution for that.
In addition, this advice might need to be different for libraries and for end-user projects.
Tools like npm, yarn and pnpm have lock files for this. They will ensure everyone uses the same exact versions of dependencies.
There is absolutely no need to ever commit your node_modules directory if you’re not paranoid about packages disappearing.
And if you really care about that last thing there are better options than caching every file from every package
I would not use the word "paranoid" here. I think it is a totally valid concern that we should not blindly assume that all the external packages and services will be always there when we need them.
Anyway I am not familiar with those better options. Would you have the time to point at some of those tools?
I don’t personally use them but there are tools that analyze your code and only cache the pieces of your dependencies that you actually use. A sort of tree shaking for dependencies. There also some monorepo caching strategies that enable using multiple versions of dependencies at the same time which do the same thing.
But I don’t use any of these as package and lock files have always worked out for me. Deprecated/broken deps have even helped me find better packages and optimize/get rid of old code, so I actually like the changing nature of package files.
Some google searches might help you find out more.
Very much agreed here. Proper lock file management will ensure that everyone is installing the same dependencies across the board. If you run into a situation where a dependency has been deleted or no longer available as a package, then the better solution here is to find an alternative to it
I was really shocked, i read your article 3 times and opened the node modules search to believe this.
Wow GitHub should start to alert this people!
github.com/community/community#mak...
Generate your standard gitignore from github.com/michaelliao/gitignore-o... and use it for every project, a good solution help to understand things well.
Some comments may only be visible to logged-in visitors. Sign in to view all comments. Some comments have been hidden by the post's author - find out more