π Solving GitHub Folder Extraction with Go β A CLI for Real-World Git Edge Cases
Cloning a full GitHub repo when you only need one folder or file is overkill β especially with monorepos.
I wanted a simple CLI tool that lets me extract just a folder from any GitHub repo. Sounds simple, right?
Itβs not. GitHubβs structure, especially with branches that have slashes in their names, makes this problem deceptively tricky. Here's how I solved it β and why the hard part wasnβt syntax or boilerplate, but reasoning through ambiguity.
π§ The Problem
Letβs say you have this URL:
[https://github.com/supabase/storage/tree/fix/pgboss-on-error-callback/src/http/plugins](https://github.com/supabase/storage/tree/fix/pgboss-on-error-callback/src/http/plugins)
- You want just the
src/http/plugins
folder. - But
fix/pgboss-on-error-callback
is the branch name β and it has slashes in it. - So how do you distinguish the branch name from the folder path?
This is where traditional methods like git sparse-checkout
break down unless you know the exact branch and folder manually.
π§ My Solution: GitSlice CLI
I built GitSlice, a Go CLI tool that:
- Accepts a GitHub folder or file URL
- Parses and resolves the correct branch and path
- Clones only the required data
- Extracts it into your current directory
gitslice https://github.com/supabase/storage/tree/fix/pgboss-on-error-callback/src/http/plugins
π§ͺ The Hardest Part: Branch vs Path Resolution
GitHub doesnβt give you the branch and folder explicitly β just a single URL. So I had to build a resolver that intelligently distinguishes between them.
Hereβs how I approached it:
π§ Step-by-Step Path Resolution Logic
1. Clone the Repository (Sparse Mode):
GitSlice starts by shallow cloning the entire repo into a temporary folder (e.g., cloneTemp). This avoids full history and keeps things lightweight
2. Parse the GitHub URL into Parts:
postTree = ['fix', 'pgboss-on-error-callback', 'src', 'http', 'plugins']
3. Inspect the Local Folder After Clone:
After cloning, we list the folder structure at the root of the repo. This gives us the actual directories present at that commit.
4. Smart Matching to Infer the Real Path:
Now we iterate through postTree, trying different splits:
First, assume the first n parts of postTree could be the branch name.
The remaining parts are then assumed to be the path within the repo.
At each step, we check if the path exists on disk. If it does, we lock that in as the correct branch/path combo.
5. Example Matching:
github_url = github.com/supabase/storage/tree/fix/pgboss-on-error-callback/src/http/plugins
- GitSlice tries:
Branch =
fix
, Path =pgboss-on-error-callback/src/http/plugins
β Not found Branch =fix/pgboss-on-error-callback
, Path =src/http/plugins
β β Match
6. Handle Nested Branch Names Intelligently:
This means GitSlice doesnβt rely on hardcoded assumptions or magic string slicing. Instead, it lets the filesystem be the truth.
π§© Step-by-Step Logic
1. Parse the post-tree
part of the URL
From a URL like:
https://github.com/user/repo/tree/branch-name/src/plugins
Split it into:
postTree = [fix, pgboss-on-error-callback, src, http, plugins]
2. Clone the repo to a temp dir
git clone --filter=blob:none --no-checkout https://github.com/user/repo cloneTemp
3. Compare folder structure to detect real path
I implemented this logic in resolveBranchAndPath()
in internal/clone/clone.go file:
func resolveBranchAndPath(clonePath string, postTree []string) (string, string, error) {
for i := 1; i < len(postTree); i++ {
branchCandidate := postTree[:i]
pathCandidate := postTree[i:]
if len(pathCandidate) == 0 {
continue
}
fullPath := filepath.Join(clonePath, filepath.Join(pathCandidate...))
if info, err := os.Stat(fullPath); err == nil && info.IsDir() {
return strings.Join(branchCandidate, "/"), strings.Join(pathCandidate, "/"), nil
}
}
return "", "", fmt.Errorf("β Could not resolve branch/path. Check if URL or folders exist")
}
This method tries all possible splits of the post-tree part and validates whether the resulting folder exists in the cloned repo. It worked across many real-world repos with complex branch names.
π Pseudocode (Pythonic Version)
Just for clarity, here's the rough equivalent logic in pseudocode:
github_url = "github.com/supabase/storage/tree/fix/pgboss-on-error-callback/src/http/plugins"
local_after_clone = 'storage/src/http/plugins'
# Split the URL and detect where 'tree' starts
result = github_url.split('/')
for i, x in enumerate(result):
if x == 'tree':
base_url = result[:i]
post_tree = result[i+1:]
break
# Now try matching folder structure
for j in range(len(post_tree)):
if post_tree[j] == local_after_clone.split('/')[0]:
branch_candidate = post_tree[:j]
path_candidate = post_tree[j:]
break
π§ Why I Used AI (And Why Thatβs Fine)
Yes, I used AI to assist with some parts β like refreshing Go syntax or generating boilerplate for CLI args. But the real problem-solving β like figuring out branch-path ambiguity, recursive folder matching, and edge case debugging β was 100% me.
AI didnβt know how to solve this. It only accelerated parts of the process β like a calculator helps with math, but doesnβt understand the problem.
π¦ Installation
With Go:
go install github.com/05sanjaykumar/gitslice@latest
Or clone and build:
git clone https://github.com/05sanjaykumar/gitslice
cd gitslice
go build -o gitslice main.go
π οΈ Usage
gitslice <github-folder-or-file-url>
Examples:
gitslice https://github.com/supabase/storage/tree/fix/pgboss-on-error-callback/src/auth
gitslice https://github.com/vercel/next.js/tree/canary/packages/next
gitslice https://github.com/05sanjaykumar/Flappy-Bird-OpenCV/blob/main/assets/background-day.png
π GitSlice vs Manual Git Steps
With GitSlice:
gitslice https://github.com/user/repo/tree/branch/folder
Manually:
git clone --filter=blob:none --sparse https://github.com/user/repo
cd repo
git sparse-checkout init --cone
git sparse-checkout set folder
git checkout branch
cp -r folder ../
cd ..
rm -rf repo
GitSlice reduces 7 steps to 1 β and eliminates the guesswork.
π¬ Final Thoughts
This project taught me how even basic-sounding ideas hide a surprising amount of technical complexity. Let me know your thoughts or how you'd approach it differently!
π GitSlice GitHub Repo
Top comments (0)