DEV Community

Cover image for I built a CLI tool to extract folders or files from GitHub repos making things easier in a single command using GO β€” GitSlice
sanjay kumar
sanjay kumar

Posted on

I built a CLI tool to extract folders or files from GitHub repos making things easier in a single command using GO β€” GitSlice

πŸš€ Solving GitHub Folder Extraction with Go β€” A CLI for Real-World Git Edge Cases

Cloning a full GitHub repo when you only need one folder or file is overkill β€” especially with monorepos.

I wanted a simple CLI tool that lets me extract just a folder from any GitHub repo. Sounds simple, right?

It’s not. GitHub’s structure, especially with branches that have slashes in their names, makes this problem deceptively tricky. Here's how I solved it β€” and why the hard part wasn’t syntax or boilerplate, but reasoning through ambiguity.


🧠 The Problem

Let’s say you have this URL:


[https://github.com/supabase/storage/tree/fix/pgboss-on-error-callback/src/http/plugins](https://github.com/supabase/storage/tree/fix/pgboss-on-error-callback/src/http/plugins)

Enter fullscreen mode Exit fullscreen mode
  • You want just the src/http/plugins folder.
  • But fix/pgboss-on-error-callback is the branch name β€” and it has slashes in it.
  • So how do you distinguish the branch name from the folder path?

This is where traditional methods like git sparse-checkout break down unless you know the exact branch and folder manually.


πŸ”§ My Solution: GitSlice CLI

I built GitSlice, a Go CLI tool that:

  • Accepts a GitHub folder or file URL
  • Parses and resolves the correct branch and path
  • Clones only the required data
  • Extracts it into your current directory
gitslice https://github.com/supabase/storage/tree/fix/pgboss-on-error-callback/src/http/plugins
Enter fullscreen mode Exit fullscreen mode

πŸ§ͺ The Hardest Part: Branch vs Path Resolution

GitHub doesn’t give you the branch and folder explicitly β€” just a single URL. So I had to build a resolver that intelligently distinguishes between them.

Here’s how I approached it:

🧠 Step-by-Step Path Resolution Logic

1. Clone the Repository (Sparse Mode):

GitSlice starts by shallow cloning the entire repo into a temporary folder (e.g., cloneTemp). This avoids full history and keeps things lightweight

2. Parse the GitHub URL into Parts:

postTree = ['fix', 'pgboss-on-error-callback', 'src', 'http', 'plugins']
Enter fullscreen mode Exit fullscreen mode

3. Inspect the Local Folder After Clone:

After cloning, we list the folder structure at the root of the repo. This gives us the actual directories present at that commit.

4. Smart Matching to Infer the Real Path:

Now we iterate through postTree, trying different splits:

  • First, assume the first n parts of postTree could be the branch name.

  • The remaining parts are then assumed to be the path within the repo.

At each step, we check if the path exists on disk. If it does, we lock that in as the correct branch/path combo.

5. Example Matching:

github_url = github.com/supabase/storage/tree/fix/pgboss-on-error-callback/src/http/plugins
Enter fullscreen mode Exit fullscreen mode
  • GitSlice tries: Branch = fix, Path = pgboss-on-error-callback/src/http/plugins β†’ Not found Branch = fix/pgboss-on-error-callback, Path = src/http/plugins β†’ βœ… Match

6. Handle Nested Branch Names Intelligently:

This means GitSlice doesn’t rely on hardcoded assumptions or magic string slicing. Instead, it lets the filesystem be the truth.

🧩 Step-by-Step Logic

1. Parse the post-tree part of the URL

From a URL like:

https://github.com/user/repo/tree/branch-name/src/plugins
Enter fullscreen mode Exit fullscreen mode

Split it into:

postTree = [fix, pgboss-on-error-callback, src, http, plugins]
Enter fullscreen mode Exit fullscreen mode

2. Clone the repo to a temp dir

git clone --filter=blob:none --no-checkout https://github.com/user/repo cloneTemp
Enter fullscreen mode Exit fullscreen mode

3. Compare folder structure to detect real path

I implemented this logic in resolveBranchAndPath() in internal/clone/clone.go file:

func resolveBranchAndPath(clonePath string, postTree []string) (string, string, error) {
    for i := 1; i < len(postTree); i++ {
        branchCandidate := postTree[:i]
        pathCandidate := postTree[i:]
        if len(pathCandidate) == 0 {
            continue
        }

        fullPath := filepath.Join(clonePath, filepath.Join(pathCandidate...))
        if info, err := os.Stat(fullPath); err == nil && info.IsDir() {
            return strings.Join(branchCandidate, "/"), strings.Join(pathCandidate, "/"), nil
        }
    }
    return "", "", fmt.Errorf("❌ Could not resolve branch/path. Check if URL or folders exist")
}
Enter fullscreen mode Exit fullscreen mode

This method tries all possible splits of the post-tree part and validates whether the resulting folder exists in the cloned repo. It worked across many real-world repos with complex branch names.


🐍 Pseudocode (Pythonic Version)

Just for clarity, here's the rough equivalent logic in pseudocode:

github_url = "github.com/supabase/storage/tree/fix/pgboss-on-error-callback/src/http/plugins"
local_after_clone = 'storage/src/http/plugins'

# Split the URL and detect where 'tree' starts
result = github_url.split('/')
for i, x in enumerate(result):
    if x == 'tree':
        base_url = result[:i]
        post_tree = result[i+1:]
        break

# Now try matching folder structure
for j in range(len(post_tree)):
    if post_tree[j] == local_after_clone.split('/')[0]:
        branch_candidate = post_tree[:j]
        path_candidate = post_tree[j:]
        break
Enter fullscreen mode Exit fullscreen mode

🧠 Why I Used AI (And Why That’s Fine)

Yes, I used AI to assist with some parts β€” like refreshing Go syntax or generating boilerplate for CLI args. But the real problem-solving β€” like figuring out branch-path ambiguity, recursive folder matching, and edge case debugging β€” was 100% me.

AI didn’t know how to solve this. It only accelerated parts of the process β€” like a calculator helps with math, but doesn’t understand the problem.


πŸ“¦ Installation

With Go:

go install github.com/05sanjaykumar/gitslice@latest
Enter fullscreen mode Exit fullscreen mode

Or clone and build:

git clone https://github.com/05sanjaykumar/gitslice
cd gitslice
go build -o gitslice main.go
Enter fullscreen mode Exit fullscreen mode

πŸ› οΈ Usage

gitslice <github-folder-or-file-url>
Enter fullscreen mode Exit fullscreen mode

Examples:

gitslice https://github.com/supabase/storage/tree/fix/pgboss-on-error-callback/src/auth
gitslice https://github.com/vercel/next.js/tree/canary/packages/next
gitslice https://github.com/05sanjaykumar/Flappy-Bird-OpenCV/blob/main/assets/background-day.png
Enter fullscreen mode Exit fullscreen mode

πŸ†š GitSlice vs Manual Git Steps

With GitSlice:

gitslice https://github.com/user/repo/tree/branch/folder
Enter fullscreen mode Exit fullscreen mode

Manually:

git clone --filter=blob:none --sparse https://github.com/user/repo
cd repo
git sparse-checkout init --cone
git sparse-checkout set folder
git checkout branch
cp -r folder ../
cd ..
rm -rf repo
Enter fullscreen mode Exit fullscreen mode

GitSlice reduces 7 steps to 1 β€” and eliminates the guesswork.


πŸ’¬ Final Thoughts

This project taught me how even basic-sounding ideas hide a surprising amount of technical complexity. Let me know your thoughts or how you'd approach it differently!

πŸ”— GitSlice GitHub Repo

Made By Sanjay Kumar

Top comments (0)