DEV Community

Cover image for Getting Good at Git
Matthew
Matthew

Posted on • Edited on

Getting Good at Git

Last week I worked on a project in which I was tasked with using movie data from different sources in order to gather insights on what factors create value in the film industry. I had written some code on using some loops to make multiple requests to the The Movie Database API. The function gathered nearly 10000 movies from the discover endpoint then used the loop to use the unique movie ids to request from a different endpoint to gather detailed movie information and write it to a data frame.

So far, so good...

It was my intention to write this blog post on that process. After completing the long process of finishing the requests and gathering all of the data into a data frame, I exported it to a csv file that I sent to a teammate. However, an incident with git and github deleted all of the files in my project repository which wiped out all the code related to the process.

Angry Fed Up GIF1

This brings me to the topic of this post, getting git to work for you and not against you.

Github repositories are a powerful tool... but understand what you are doing

What happened in my case is that I forked a repository but cloned the original repository so I ended up pushing all of my changes up to the wrong location. We were able to get the files off of the repository and change my upstream but in that process, all the files were wiped off of my local directory.

Platforms that host repositories are a great way to share versions of code and store files off of your working computer. But if you do not use git properly in your use of GitHub or Gitlab, it can can only hamper your data science or coding project. So, I will share some quick tips that I have gathered from my own recent struggles.

Alt Text
2


Step 1: Fork it

If there is a repository on GitHub that you are interested in using, the important first step is to fork it. This will create a copy of the repository linked to your account. It may not seem vital if you are just trying to access some code or some files but it will save you heartbreak in the long run.

Step 2: Clone the repository

The next step is to clone the repository to your local machine using the git clone command in your terminal but make sure that you are paying attention to the path that you are pasting into your terminal. This is because when you clone the repository into a local directory, git is setting up the connection the upstream remote repository hosted on GitHub. Practically you will navigate to the directory that you want to use on your local machine and then use the git clone command with the copied link to the forked repository.

Step 3: Work

Great. So now we are all set up with a directory that is connected to a remote repository using git. You can now start working with files in the local directory to your heart's content. The issues that arise often occur in the next step when you are trying to upload files to the remote repository.

Step 4: .gitignore

Before you start committing changes and pushing files to the repository, you will want to set up your .gitignore. This is a file that is stored in your local directory connected to the remote using git. You can add file paths to the body of this file and it will ignored those files when you start using git to move files to the remote location. It is recommended to put config files, api keys, environment files in the .gitignore but you can add any files that you do not want on your public repository.

Step 5: Adding files to the staging area

Moving files from your local repository to the remote location on GitHub is a mult-stage process and has to be done in order to be successful. The first step is to add the files to the staging area. The command is git add filename. Additionally, you can use git add --all to add all files and subdirectories to the staging area. If your .gitignore is set up properly, it will still continue to ignore the specified files and add everything else. There are also ways to force git to add the files anyway using git add filename -f.

Step 6: Commit

Once you have added files using git add, the next step is to commit changes. One thing that I like to do first (and in every stage of this process) is to use the command git status. This will tell you what files are in the staging area, as well as what files are different from those in the repository.

Now you are ready to commit your changes of the files that you have added to the local repository. Again, this will only affect the files currently in the staging area. It is required to add a commit message and it is a good habit to make this meaningful as once you push the files to your public repository, the message history will be showing. The command is git commit -m "Commit Message".

After you have completed this step, you will have committed changes using git. However, this is only on the local git repository in that directory. There is another step to move committed files to the remote repository hosted publicly on GitHub as well to pull the files that others have put on it.

Push It
3

Step 7: Push it (and pull)

GitHub is a powerful platform because it allows multiple people to work using a public repository and add files and remove them. It also allows different branches of the repository to work with. If you are working on a project with someone else, you will most likely have to add files to a shared repository and pull files your partners have added.

Once you have committed your changes, you are ready to push those changes to the remote upstream repository. However, it is best to pull the changes from the remote repository first. There will be conflicts if you are trying to push changes to a repository that has changed. Using the command git pull branch, you will pull any changes from the repository and add them to your local directory.

Now you can push committed changes up to the repository. This is done using the command git push branch. By default, it will push and pull to/from the origin master branch but you can specify otherwise. The final step is to check on the public repository if the changes have registered.


Quick tips

Fork it
Clone it into a new directory and know where that is
Set up your gitignore
Use git status often to see what the changes are in your local repository and the stages of git add, git commit, git push, etc
Make meaningful commit messages to track your commit history
Pull before you push
Use different filenames than your partners that use the repository. This will prevent conflicts and the process of having to resolve these merge conflicts
Read the git docs for more tips about how to manipulate git in your favor https://git-scm.com/docs

That's All
4


  1. Banner image source: https://techcrunch.com/2019/01/07/github-free-users-now-get-unlimited-private-repositories/

  1. https://giphy.com/gifs/LE4FWkEdR7kC4 

  2. https://tenor.com/view/brooklyn99-andy-samberg-jake-peralta-do-not-blow-this-for-us-gif-12004248 

  3. https://giphy.com/gifs/pix-pepa-Maf9DN4Ftb0BO 

  4. https://giphy.com/gifs/PixelBandits-pixel-forest-YBJHgYmcNFRODvssAL 

Top comments (5)

Collapse
 
thefern profile image
Fernando B 🚀

Glad to hear that you were able to recover the code. I think there's a few things to point out. Before making changes to the code make a new branch and work on it, otherwise you won't be able to do pull requests if you are working on master branch.

Secondly all you needed here is adding a new remote, below cmd will show you current remote urls.

git remote -v

So you'll properly fork it, then add new remote to local repo which you've originally cloned without forking. Usually your remote is "origin", and original repo "upstream".

dev.to/dance2die/push-git-cloned-r...

Collapse
 
mdani38 profile image
Matthew

Thank you for the tip.

Collapse
 
thefern profile image
Fernando B 🚀

Btw checkout my git and commits gists, they might come handy later on. gist.github.com/kodaman2

Thread Thread
 
mdani38 profile image
Matthew

Looks great. Thank you for the advice.

Collapse
 
thefern profile image
Fernando B 🚀

Anytime! 😁