DEV Community

Cover image for How to "Git Copy": copying files while keeping git history
Deckstar
Deckstar

Posted on

How to "Git Copy": copying files while keeping git history

Have you ever wanted to break a long file into several smaller files, but worried about losing all the git blame history? Well, with the bash script in this post, you'll be able to split your file into as many files as you want, while still keeping the git history for every single line!

Table of Contents

Motivation
The script
Demo
Pros and cons of this method
Conclusion
Further reading

Motivation

Why would you ever want to copy a file like this? Basically, it's useful whenever you want to turn one file into multiple, but you still want to git history to stay visible, so people could track the evolution of this file. Not only that, but it ensures that you can also blame, or rather git blame, the right culprits for... interesting coding decisions 😉

I can think of two types of situations that I've encountered often:

  1. Sometimes, a file grows too big to be easily legible and understandable. In those cases, it makes sense (and is even recommended) to split it up into smaller chunks.
    • For me, this is a pretty regular thing with React components. It often happens that a small, seemingly simple component grows more and more complex, until there's too much unrelated logic in one place.
      • Example: a "page" component may start out with a few "sections", and the sections may have their own, complicated logic that's only got something to do with that section. This would be a prime candidate for splitting into multiple files.
  2. Other times, you might want to turn one file into multiple. Perhaps because they have essentially the same functionality.
    • Example, you may start out developing with one "config" file: config.js. But as your project gets closer to production-ready, you may realise that you want some different settings for "production" mode, and opt to have two files: config.development.js and config.production.js. Regularly copying the config file would make it look like all the production-mode changes happened all at once, whereas "Git copying" the old config would — you guessed it — let other developers track the evolution of the file.

Two use cases:

  1. Splitting a large file into smaller files.
  2. Creating a copy of a file that will always remain similar to the original.

The Script

Without further ado, here's the bash script that you can copy-paste and then use at your own convenience:



#!/bin/bash

# HOW TO USE:
#
# PSEUDO-CODE TEMPLATE:
# `
# bash ./gitCopy.bash {fileToCopy} {...newFiles}
# `
#
# EXAMPLE:
# Lets say you have a file called Section.tsx. Your section has grown very big
# and you want to split it into into three subsection files, while preserving
# the git history. Your plan is to copy the section file into three new files, in
# a new subfolder called "subsections".
#
# In that case, you would do something like this:
#
# `
# bash ./gitCopy.bash ./components/Section.tsx ./components/subsections/Section1.tsx ./components/subsections/Section2.tsx ./components/subsections/Section3.tsx
# `

GRAY='\033[1;30m'
GREEN='\033[0;32m'
LIGHT_BLUE='\033[1;34m'
RED='\033[0;31m'
NO_COLOR='\033[0m'


fail_and_quit () {
    echo -e "\n${RED}Failed to git copy.${NO_COLOR}"
    exit 0
}

if [ ! \( -f "$1" -a $# -ge 2 \) ]; then
    echo -e "\n${RED}Invalid inputs${NO_COLOR}"

    cat 1>&2 <<-EOF
    Usage: \$0 ORIGINAL copy1 [... copyN]

    Copy ORIGINAL, preserving history for git blame
    New history will have N+3 commits

    EOF
    exit 1
fi

ORIGINAL="$1";

# shift to $2 and start counting arguments from 2
shift;

# Messages
echo -e "\nWill copy ${GRAY}${ORIGINAL}${NO_COLOR} into ${LIGHT_BLUE}+$# files${NO_COLOR}:"

args=("$@")
for i in "${!args[@]}"; do
   echo -e "    $i. ${GRAY}${args[$i]}${NO_COLOR}"
done

echo -e "New history will have ${GREEN}+$(($# + 3)) commits ${NO_COLOR}\n"
# /Messages

NEWLINE=$'\n'
KEEP=$(mkdir -p $(dirname $1) && mktemp ./"$1".XXXXXXXX);
MESSAGE="Copied (with git history):$NEWLINE$NEWLINE$ORIGINAL$NEWLINE$NEWLINE — into: —$NEWLINE$NEWLINE$@"
SPLIT=""

# Remember current commit
ROOT=$(git rev-parse HEAD)

# Check for errors
if [ -z "$ORIGINAL" ]; then
    echo -e "\n${RED}ERROR:${NO_COLOR} Did not get ORIGINAL variable."
    fail_and_quit
elif [ -z "$KEEP" ]; then
    echo -e "\n${RED}ERROR:${NO_COLOR} Did not get KEEP variable."
    fail_and_quit
elif [ -z "$MESSAGE" ]; then
    echo -e "\n${RED}ERROR:${NO_COLOR} Did not get MESSAGE variable."
    fail_and_quit
fi


# Create branch where $2 has $ORIGINAL's history
for f in "$@"; do
    git reset --soft $ROOT
    git checkout $ROOT "$ORIGINAL"
    git mv -f "$ORIGINAL" "$f"
    git commit -n -m "* (create $f)$NEWLINE$NEWLINE$MESSAGE"
    SPLIT="$(git rev-parse HEAD) $SPLIT"
done

# Go back to initial branch and move $ORIGINAL out of the way
git reset --hard HEAD^
git mv "$ORIGINAL" -f "$KEEP"
git commit -n -m "* (keep $ORIGINAL)$NEWLINE$NEWLINE$MESSAGE"

# Merge $2's branch back into the original
git merge $SPLIT -m "* (merge)$NEWLINE$NEWLINE$MESSAGE"
git commit -a -n -m "* (merge)$NEWLINE$NEWLINE$MESSAGE"

# Move $ORIGINAL back where it was
git mv "$KEEP" "$ORIGINAL"
git commit -n -m "$MESSAGE"


# Report
echo -e "\nNew history: ${GRAY}$(git rev-parse --short $ROOT)..$(git rev-parse --short HEAD)${NO_COLOR}"
echo -e "\n${GREEN}Success!${NO_COLOR}\n"
exit 0



Enter fullscreen mode Exit fullscreen mode

I usually title this thing gitCopy.bash. But in principle you could call it whatever you want.

Note as well that a lot of code in here is just UI fluff (in particular, most of the logic with the color- & text-related variables is completely unnecessary to achieve the core functionality). You don't really need all the colorful messages to show up in your terminal when running this script. But I just think that they're nice to have 😉

Demo

As you can see in the GIF below, this script should be pretty easy to use:

Demo of the
Demo of the "gitCopy" script.
Notice how both files show the same git blame information, using VS Code's GitLens extension.

Pros and cons of this method

Advantages:

  • Your git history gets kept in every file;
    • You and others will be able to easily see the git blame records and track how the file changed and why
  • It's very easy to use — just copy a file path, then decide on one or more new ones, press enter and voilà!
  • If you change your mind, you can always revert the commits.

Disadvantages:

  • This method uses the "octopus-merge" strategy between the temporary branches, which means that your git history will no longer be a perfectly straight line.
  • This method creates a lot of commits, which can be a bit frustrating.
    • How many commits? Well, it's N + 3 for every file that you want to copy, where N is the number of copies that you want to make.
      • So, for example, let's say that you want to copy two different files, with 1 copy of the first file and 3 copies of the second file. For instance, maybe you want to copy config.js into config.production.js, and you want to copy Component.tsx into Subcomponent1.tsx, Subcomponent2.tsx and Subcomponent3.tsx. That would mean 1 + 3 = 4 commits for the config file, and 3 + 3 = 6 commits for the Component. Plus at least 1 more commit if you want to actually edit your new files, bringing the total up to at least 11. That's a lot of commits!
    • The general formula here is:
      ∑Cf=(N1+3)+(N2+3)...+(Nf+3)\sum C_f = (N_1 + 3) + (N_2 + 3) ... + (N_f + 3)

      where:

      • ∑Cf\sum C_f is the sum total of the number of commits that you will create by copying all the ff files, and
      • NiN_i is the number of copies that you want to create of each file ii .
    • We can see that the minimum number of commits here is 4 (when copying one file one time).
    • This large number of commits may start to feel overwhelming when reading commit histories, or when reading through pull requests.

Conclusion

So that's "git copying" in a nutshell.

We've seen that this method is an easy and flexible way to keep git history while copying files. However, we've also seen that it produces a lot of commits, and requires an "octopus merge" strategy, which may not be ideal in some teams.

As always with coding tools, you will have to decide for yourself when and whether to use this new trick that I have just presented to you 🙂

Good luck and happy coding!


Further reading

  • Raymond Chen's blog post (2019) about splitting files while keeping git history, using a sequence of new file names and new git branches.
  • David Sherman's Gitlab snippet (2019) documenting a bash script which that automatically copies a file as many times as needed to to new files, which served as the main inspiration for the script in this post.

Top comments (9)

Collapse
 
milnor profile image
Milnor •

Great article. Thank you!! A co-worker and I were discussing this very problem and he referred me to your blog post. And the further reading looks helpful too. Cheers ;)

Collapse
 
deckstar profile image
Deckstar •

Thanks! Glad I could help! 🙂 Happy copying!

Collapse
 
ccoveille profile image
Christophe Colombier •

Good article, thanks

Collapse
 
arnehenningsen profile image
arne-henningsen •

It sounds like a very useful tool. I copied the script to a text editor, saved it as gitCopy.bash, and made is executable. Although I tried out different things, I couldn't get it to work (Ubuntu 22.04 LTS):

bash ~/bin/gitCopy.bash ./folderA/fileA ./folderB/fileB

/home/abc/bin/gitCopy.bash: line 110: warning: here-document at line 37 delimited by end-of-file (wanted `EOF')

/home/abc/bin/gitCopy.bash: line 111: syntax error: unexpected end of file

Any ideas?

Collapse
 
deckstar profile image
Deckstar •

I'm afraid all I can say is "it works on my machine" ¯_(ツ)_/¯ (M1 Mac on Ventura 13.2.1)

I think it might have to do something with tabs vs spaces. See this StackOverflow question.

I copy pasted the script into my own text-editor and compared it with git. I noticed that the copy-pasted version used spaces instead of tabs in lines 38-43 (From Usage: [...] to EOF). All of those lines should start with one tab, instead of spaces (or at least the final "EOF" line, I presume).

Can you try that and see if it helps?

Collapse
 
arnehenningsen profile image
arne-henningsen •

Yes, replacing spaces by tabs worked :-) Thanks a lot! Perhaps, you can post a link for downloading the script (e.g., from GitHub) so that others won't have the same problem that I had.

Thread Thread
 
deckstar profile image
Deckstar •

Yeah, that would probably be a good idea, but I'm a bit lazy... 😄 Hopefully people will be able to figure it out by just reading these comments instead... 😇

Thread Thread
 
milnor profile image
Milnor •

I took the liberty of pasting it into a new repo and replacing tabs around the HERE document with spaces: github.com/Milnor/Git-Tricks/blob/...

It credits you (of course) and the README has a link back to your blog.

But if you'd like me to take it down, just let me know. It's such a great script I wanted easy git clone access to it.

Thread Thread
 
deckstar profile image
Deckstar •

Hey, great job! Looks awesome! Thanks a lot! 🙂