Muhammad Ikramullah Khan

Posted on May 1

Collaboration: Working on Code Without Killing Each Other

#ai #productivity #programming #webdev

You find a scraper on GitHub that does exactly what you need. Almost. It scrapes product prices but doesn't handle pagination. You know how to add pagination. You could copy the code, modify it, and use your own version.

But that's wasteful. The original developer maintains the project. Fixes bugs. Adds features. If you copy it, you're on your own. You have to manually copy their updates into your version forever.

Better idea: add pagination to their project. Contribute it back. Now everyone benefits. They get pagination. You get their future updates automatically. Win-win.

Or maybe this happens. You're working with a teammate on the same scraper. You both edit scraper.py. You finish first. Commit. Push. Done.

Your teammate finishes. Tries to push. Git rejects it. "Your branch is behind. Pull first." They pull. Merge conflict. Your changes clash with theirs. Same function. Different implementations.

Now you're both stuck. Who's code wins? How do you merge without breaking both versions? Do you text each other? Email? Video call to figure out whose change stays?

This is where collaboration workflow saves you.

Git and GitHub have built-in tools for working with other people. Forking projects. Proposing changes. Reviewing code. Resolving conflicts. Tracking issues. The tools exist. You just need to learn the pattern.

Let me show you how to collaborate without chaos.

The Two Collaboration Scenarios

You'll encounter two main situations.

Scenario 1: Contributing to Someone Else's Project

You don't have permission to push to their repo. You can't just commit and push like it's yours.

The workflow:

Fork (make your own copy on GitHub)
Clone your fork
Create a branch
Make changes
Push to your fork
Create pull request to original repo
Wait for maintainer to review and merge

This is how open source works.

Scenario 2: Team Project (Shared Access)

Everyone has permission to push. But you need rules so people don't overwrite each other.

The workflow:

Clone the shared repo
Always pull before starting work
Create a branch for your feature
Make changes
Push your branch
Create pull request
Teammate reviews
Merge after approval
Everyone pulls the updated main

This is how teams work.

Both scenarios use the same Git features (branches, pull requests). The difference is permissions.

Forking: Making Your Own Copy

Forking creates a copy of someone's repo under your GitHub account.

When to Fork

Contributing to open source projects
Customizing a project for your needs
Experimenting with changes before proposing them
Learning from a project by modifying it

How to Fork

Let's fork a real project. We'll use a simple scraper template.

Step 1: Find a Project

Go to: https://github.com/example-user/simple-scraper

(For this example, find any small scraping project you want to contribute to)

Step 2: Click Fork

Top-right corner: "Fork" button
Choose your account (where to fork it)
Optional: Change the name
Click "Create fork"

You now have a copy: https://github.com/YOUR-USERNAME/simple-scraper

This is yours. You can push to it. Modify it. Break it. The original is untouched.

Cloning Your Fork

# Clone YOUR fork, not the original
git clone https://github.com/YOUR-USERNAME/simple-scraper.git
cd simple-scraper

# Check the remote
git remote -v

Output:

origin  https://github.com/YOUR-USERNAME/simple-scraper.git (fetch)
origin  https://github.com/YOUR-USERNAME/simple-scraper.git (push)

origin points to your fork.

Adding the Original as Upstream

You'll want to pull updates from the original project.

# Add the original repo as 'upstream'
git remote add upstream https://github.com/example-user/simple-scraper.git

# Verify
git remote -v

Output:

origin    https://github.com/YOUR-USERNAME/simple-scraper.git (fetch)
origin    https://github.com/YOUR-USERNAME/simple-scraper.git (push)
upstream  https://github.com/example-user/simple-scraper.git (fetch)
upstream  https://github.com/example-user/simple-scraper.git (push)

Now you have two remotes:

origin - your fork (you can push here)
upstream - original repo (you can only pull from here)

Contributing to Open Source

Let's add a feature to the project we forked.

Step 1: Create a Feature Branch

# Make sure you're on main
git checkout main

# Pull latest from original project
git pull upstream main

# Create feature branch
git checkout -b add-pagination

Step 2: Make Your Changes

# Edit scraper.py
# Add pagination functionality

def scrape_all_pages(base_url):
    """Scrape all pages with pagination"""
    all_products = []
    page = 1

    while True:
        url = f"{base_url}?page={page}"
        products = scrape_products(url)

        if not products:
            break

        all_products.extend(products)
        page += 1

    return all_products

Step 3: Test Your Changes

# Run the scraper
python scraper.py

# Make sure it works
# Test edge cases

Step 4: Commit Your Changes

git add scraper.py
git commit -m "Add pagination support for multi-page scraping"

Good commit message format:

First line: Brief summary (50 chars or less)
Optional: Blank line + detailed explanation

git commit -m "Add pagination support for multi-page scraping

- Implemented scrape_all_pages() function
- Handles pagination automatically
- Stops when no more products found
- Tested with 10+ page product listings"

Step 5: Push to Your Fork

# Push to YOUR fork (origin), not the original (upstream)
git push -u origin add-pagination

Step 6: Create a Pull Request

On GitHub:

Go to your fork: https://github.com/YOUR-USERNAME/simple-scraper
You'll see: "add-pagination had recent pushes" with "Compare & pull request" button
Click "Compare & pull request"

Pull Request Form:

Base repository: example-user/simple-scraper (original)
Base branch: main
Head repository: YOUR-USERNAME/simple-scraper (your fork)
Compare branch: add-pagination

Title: "Add pagination support"

Description:

## Summary
Adds pagination support for scraping multi-page product listings.

## Changes
- New function `scrape_all_pages()` that handles pagination
- Automatically iterates through pages until no products found
- Backward compatible (existing code still works)

## Testing
- Tested with 10+ page listings
- Tested with single page (no regression)
- Tested with invalid URLs (proper error handling)

## Example Usage

python
products = scrape_all_pages('https://example.com/products')
print(f'Scraped {len(products)} products across multiple pages')


Fixes #15

shell

The Fixes #15 references an issue number. If there's an open issue requesting this feature, link it.

Click "Create pull request"

Step 7: Wait for Review

The maintainer will:

Review your code
Test it
Ask questions or request changes
Approve and merge, or decline

If they request changes:

# Make the requested changes
# Edit files

# Commit
git add .
git commit -m "Address review feedback: improve error handling"

# Push to same branch
git push

The pull request updates automatically. No need to create a new one.

If they merge:

Your code is now in the original project! Congratulations, you contributed to open source.

Cleanup:

# Switch back to main
git checkout main

# Pull the merged changes from original
git pull upstream main

# Delete your feature branch
git branch -d add-pagination

# Update your fork's main on GitHub
git push origin main

Team Collaboration Workflow

You're working with teammates on a shared repo. Everyone has push access.

The Golden Rules

Never push directly to main (use branches + pull requests)
Always pull before starting work
Create a branch for every feature/fix
Get approval before merging (code review)
Pull main before merging your branch

Daily Team Workflow

Morning routine:

# Update your local main
git checkout main
git pull origin main

# Create today's feature branch
git checkout -b feature/user-login

During the day:

# Make changes
# (edit files)

# Commit regularly
git add .
git commit -m "Implement login validation"

# Push to GitHub (backup + share with team)
git push -u origin feature/user-login

When feature is done:

# Make sure main hasn't changed
git checkout main
git pull origin main

# Merge main into your feature branch (get latest changes)
git checkout feature/user-login
git merge main

# Resolve any conflicts
# (if main changed while you worked)

# Push updated branch
git push

# Create pull request on GitHub
# Wait for teammate review
# Merge after approval

After your PR is merged:

# Update local main
git checkout main
git pull origin main

# Delete feature branch
git branch -d feature/user-login
git push origin --delete feature/user-login

Handling Simultaneous Work

Scenario: You and a teammate both work on different features simultaneously.

You:

git checkout -b feature/add-export
# (work on export feature)
git commit -m "Add CSV export"
git push -u origin feature/add-export
# (create PR, get approved, merge)

Teammate (at the same time):

git checkout -b feature/add-filters
# (work on filters)
git commit -m "Add product filters"
git push -u origin feature/add-filters
# (create PR)

Your teammate's PR now needs updating:

# Your export feature merged first
# Teammate needs to update their branch

git checkout feature/add-filters
git pull origin main  # Get your merged export feature
# (resolve any conflicts if both touched same files)
git push
# (PR updates, gets approved, merges)

This is why "pull before merge" matters.

Code Review Best Practices

Pull requests aren't just for merging. They're for reviewing code.

As the Author (Creating PR)

Write a clear description:

## What this PR does
Adds ability to export scraped data to CSV format.

## Why we need it
Users requested CSV export for importing to Excel.

## How to test
1. Run scraper: python scraper.py
2. Check data/export.csv exists
3. Open in Excel, verify formatting

## Screenshots
(if UI changes)

## Checklist
- [x] Code works locally
- [x] Added tests
- [x] Updated README
- [x] No merge conflicts

Keep PRs small:

One feature per PR
Easier to review
Faster to merge
Less likely to have conflicts

Respond to feedback graciously:

"Good catch, I'll fix that"
"I didn't consider that case, adding now"
"Great suggestion, implemented"

As the Reviewer

What to look for:

Does it work? (test the code locally)
Does it break anything? (run existing tests)
Is it readable? (can you understand it?)
Is it secure? (no passwords, proper input validation)
Is it efficient? (no obvious performance issues)

How to comment:

Good feedback:

Line 23: Consider adding error handling here for when the URL is invalid.

Suggestion:
try:
    response = requests.get(url)
    response.raise_for_status()
except requests.exceptions.RequestException as e:
    logger.error(f"Failed to fetch {url}: {e}")
    return None

Bad feedback:

this is bad

Be specific. Be constructive. Suggest solutions.

Approving or requesting changes:

On GitHub:

"Comment" - add feedback without blocking
"Approve" - looks good, ready to merge
"Request changes" - needs fixes before merging

Reviewing Your Own Code

Even solo, review your own PRs:

Create PR
Look at the diff (what changed)
Read it like you didn't write it
Catch mistakes you missed
Merge when satisfied

This habit catches bugs before production.

Handling Merge Conflicts in Teams

Two people edited the same line. Git can't auto-merge.

Example Conflict

Person A edits scraper.py:

def scrape_product(url):
    response = requests.get(url, timeout=30)

Person B edits the same line:

def scrape_product(url):
    response = requests.get(url, timeout=60)

Both push to different branches. Person A merges first. Person B tries to merge.

Conflict!

Resolving the Conflict

Person B:

# Pull latest main
git checkout main
git pull origin main

# Try to merge into feature branch
git checkout feature/increase-timeout
git merge main

Output:

Auto-merging scraper.py
CONFLICT (content): Merge conflict in scraper.py
Automatic merge failed; fix conflicts and then commit the result.

Open scraper.py:

def scrape_product(url):
<<<<<<< HEAD
    response = requests.get(url, timeout=60)
=======
    response = requests.get(url, timeout=30)
>>>>>>> main

Person B's choices:

Keep their change (60): Maybe they tested and 60 is better
Keep Person A's change (30): Maybe 30 is the agreed standard
Choose different value: Maybe 45 is a compromise
Ask Person A: "Hey, I increased timeout to 60. You set it to 30. Which should we use?"

After deciding (let's say keep 60):

def scrape_product(url):
    response = requests.get(url, timeout=60)

Remove the markers. Save file.

# Mark conflict as resolved
git add scraper.py

# Complete the merge
git commit -m "Merge main, kept 60s timeout after discussion with Person A"

# Push updated branch
git push

The PR updates. Reviewer sees the conflict was resolved. Merges when satisfied.

.gitignore: What NOT to Commit

Some files shouldn't be in Git.

Why .gitignore Matters

Bad things that happen when you commit the wrong files:

API keys get exposed (security breach)
Repo becomes huge (data files, databases)
Merge conflicts on generated files (compiled code, caches)
Teammate's environment breaks (OS-specific files)

Creating .gitignore

# In your project root
touch .gitignore

Essential patterns for Python projects:

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
ENV/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# Environment
.env
.venv
config.py

# IDE
.vscode/
.idea/
*.swp
*.swo
*~

# OS
.DS_Store
Thumbs.db

# Data files
*.csv
*.json
*.xlsx
data/
scraped_data/

# Logs
*.log
logs/

# Database
*.db
*.sqlite3

Commit .gitignore:

git add .gitignore
git commit -m "Add .gitignore for Python project"

Common Patterns

Ignore all files of a type:

*.csv
*.log
*.db

Ignore a directory:

data/
logs/

Ignore all except one file:

# Ignore all .env files
.env*

# Except the example
!.env.example

Ignore files in root only:

/config.py  # Only root config.py, not src/config.py

Testing .gitignore

# Check what Git sees
git status

# If ignored files still show up:
# They were already tracked before .gitignore

# Remove from Git (keep on disk)
git rm --cached filename.csv

# Commit the removal
git commit -m "Stop tracking data files"

.gitignore for Scrapers

# Virtual environment
venv/
env/

# Environment variables (API keys!)
.env
config.py

# Scraped data
data/
*.csv
*.json
*.xlsx

# Logs
*.log
logs/

# Database
*.db
*.sqlite3

# Temporary files
temp/
cache/

# Screenshots (if using Selenium)
screenshots/

# Keep example config
!config.example.py

GitHub Issues: Tracking Work

Issues are like to-do lists for your project.

Creating an Issue

On GitHub → Issues → New issue

Title: "Add pagination support"

Description:

## Problem
Scraper only gets first page of results. Multi-page listings are incomplete.

## Proposed Solution
Add pagination support to scrape_products() function.

## Example
Site: https://example.com/products
Current: Gets 20 products (page 1)
Desired: Gets all 200 products (10 pages)

## Acceptance Criteria
- [ ] Scrapes all pages automatically
- [ ] Stops when no more products
- [ ] Handles pagination URLs (?page=N)
- [ ] Works with existing code

Labels: enhancement, good first issue

Assignee: Yourself or teammate

Click "Submit new issue"

Working on an Issue

# Reference issue in branch name
git checkout -b fix-pagination-15

# Reference issue in commits
git commit -m "Add pagination support (fixes #15)"

# Reference issue in PR
# (GitHub auto-links when you mention #15)

When the PR merges, the issue closes automatically.

Using Issues for Bug Reports

Title: "Scraper crashes on products without prices"

Description:

## Bug Description
Scraper crashes with `AttributeError: NoneType` when product has no price.

## Steps to Reproduce
1. Run: `python scraper.py`
2. URL: https://example.com/products/out-of-stock
3. Error occurs on line 45

## Expected Behavior
Should skip products without prices or set price to 0.

## Actual Behavior

AttributeError: 'NoneType' object has no attribute 'text'
File "scraper.py", line 45
price = price_elem.text


## Environment
- Python 3.9
- Ubuntu 20.04
- Requirements.txt versions

Labels: bug, high priority

This gives teammates everything they need to reproduce and fix.

GitHub Projects: Organizing Work

Projects are kanban boards for issues and PRs.

Creating a Project

On GitHub → Projects → New project

Name: "Scraper Roadmap"
Template: "Board"
Create

Default columns:

Todo
In Progress
Done

Add custom columns:

Backlog (ideas for later)
Review (PRs waiting for review)
Testing (needs testing before deploy)

Adding Issues to Project

Drag issues between columns:

Backlog: "Add authentication support"
Todo: "Add pagination" (next to work on)
In Progress: "Fix price parsing bug" (actively working)
Review: "Add CSV export" (PR created, waiting review)
Done: "Initial scraper setup" (merged and deployed)

Team Workflow with Projects

Weekly planning:

Review Backlog
Move important issues to Todo
Assign to team members

Daily:

Move your issue to In Progress when you start
Create PR when done
Move to Review
Teammate reviews
Merge and move to Done

Visual progress tracking. Everyone sees what's being worked on.

Forking vs Branching: When to Use Which

Use Forking When:

Contributing to projects you don't have push access to
Open source contributions
Experimenting with major changes to someone else's project
Creating your own version of a project

Pattern: Fork → Clone fork → Branch → PR to original

Use Branching When:

Working on a project you have push access to
Team projects
Your own projects
Any project where you're a collaborator

Pattern: Clone → Branch → PR to same repo

Same Repo, Collaborative:

# Everyone clones the same repo
git clone https://github.com/team/project.git

# Everyone creates branches
git checkout -b feature/my-feature

# Everyone creates PRs to main branch
# (same repo, different branches)

Fork-Based, External Contribution:

# Fork on GitHub
# Clone your fork
git clone https://github.com/YOU/project.git

# Create branch
git checkout -b feature/my-contribution

# PR to original repo
# (from your fork to their main)

Real Example: Contributing to a Scrapy Project

Let's contribute to a real open source project.

Step 1: Find a Project

Search GitHub: "scrapy spider" or "web scraper python"

Let's say you find: scrapy-examples/ecommerce-spider

Step 2: Fork It

Click "Fork" on GitHub.

Step 3: Clone Your Fork

git clone https://github.com/YOUR-USERNAME/ecommerce-spider.git
cd ecommerce-spider

# Add upstream
git remote add upstream https://github.com/scrapy-examples/ecommerce-spider.git

Step 4: Find Something to Contribute

Check Issues:

Look for good first issue label
Or help wanted

Let's say Issue #42: "Add support for product reviews"

Step 5: Create Feature Branch

git checkout -b add-product-reviews

Step 6: Implement the Feature

# spiders/products.py

class ProductSpider(scrapy.Spider):
    name = 'products'

    def parse(self, response):
        # Existing product parsing
        yield {
            'name': response.css('.product-name::text').get(),
            'price': response.css('.price::text').get(),
        }

        # New: Follow review links
        review_url = response.css('.reviews-link::attr(href)').get()
        if review_url:
            yield response.follow(review_url, self.parse_reviews)

    def parse_reviews(self, response):
        """Parse product reviews"""
        for review in response.css('.review'):
            yield {
                'product_url': response.url,
                'rating': review.css('.rating::attr(data-rating)').get(),
                'text': review.css('.review-text::text').get(),
                'author': review.css('.author::text').get(),
                'date': review.css('.date::text').get(),
            }

Step 7: Test Thoroughly

# Run the spider
scrapy crawl products -o output.json

# Check the output
cat output.json

# Make sure reviews are included

Step 8: Update Documentation

# README.md

## Features

- Scrapes product names and prices
- Follows pagination automatically
- **NEW:** Scrapes product reviews with ratings

## Output

The spider outputs JSON with product data and reviews:

json
[
{
"name": "Laptop Pro 15",
"price": "$1299.99"
},
{
"product_url": "https://example.com/laptop-pro-15",
"rating": "5",
"text": "Excellent laptop!",
"author": "John D.",
"date": "2024-03-15"
}
]

shell

Step 9: Commit and Push

git add .
git commit -m "Add product review scraping support (fixes #42)

- Implemented parse_reviews() method
- Follows review links from product pages
- Extracts rating, text, author, date
- Tested with 50+ products
- Updated README with review output format"

git push -u origin add-product-reviews

Step 10: Create Pull Request

On GitHub:

Title: "Add product review scraping support"

Description:

Closes #42

## Summary
Adds ability to scrape product reviews in addition to product data.

## Changes
- New `parse_reviews()` method
- Follows review links from product pages
- Extracts rating, review text, author, and date
- Updated README with examples

## Testing
- Tested on 50+ products with reviews
- Handles products without reviews (gracefully skips)
- Verified output format

## Output Example

json
{
"product_url": "https://example.com/laptop",
"rating": "5",
"text": "Great product",
"author": "Jane S.",
"date": "2024-03-15"
}


## Checklist
- [x] Code tested locally
- [x] Documentation updated
- [x] No breaking changes
- [x] Follows project coding style

shell

Step 11: Respond to Review

Maintainer comments: "Can you add error handling for missing review dates?"

# Make changes
# Edit the code

git add .
git commit -m "Add error handling for missing review dates"
git push

PR updates automatically.

Step 12: PR Gets Merged

Congratulations! You contributed to open source.

Cleanup:

git checkout main
git pull upstream main
git branch -d add-product-reviews
git push origin main

Your contribution is now part of the project forever.

Team Communication

Git tracks code. Communication tracks decisions.

Good Commit Messages

Bad:

updated stuff
fixed bug
changes

Good:

Add pagination support for multi-page scraping

- Implemented scrape_all_pages() function
- Automatically detects last page
- Tested with 20+ page listings

Fixes #15

Format:

Short summary (50 chars or less)

Detailed explanation (wrap at 72 chars):
- What you changed
- Why you changed it
- How to test it

Fixes #issue-number

Pull Request Etiquette

Do:

Explain what and why
Link related issues
Respond to feedback promptly
Test your code
Update documentation

Don't:

Create massive PRs (500+ line changes)
Ignore review feedback
Merge without approval
Force push after review started
Get defensive about criticism

Code Review Comments

As reviewer, be kind:

❌ "This is wrong"
✅ "Consider handling the case where URL is None"

❌ "Bad code"
✅ "This could be optimized by caching the results"

❌ "Didn't you read the docs?"
✅ "The library actually has a built-in method for this: requests.get(url, timeout=30)"

As author, be receptive:

❌ "My code is fine"
✅ "Good point, I'll add that check"

❌ "That's not important"
✅ "I see your concern. How about this approach instead?"

Advanced Team Workflows

Protected Branches

Prevent direct pushes to main.

On GitHub → Settings → Branches → Add rule:

Branch name: main
✅ Require pull request before merging
✅ Require approvals: 1
✅ Require status checks to pass
Save

Now nobody can push directly to main. All changes go through PRs.

Required Reviews

Settings → Branches → main:

✅ Require approvals: 2 (for teams)
✅ Dismiss stale reviews when new commits pushed

Forces code review before merging.

Auto-Delete Branches

Settings → General:

✅ Automatically delete head branches

When PR merges, branch auto-deletes. Keeps repo clean.

Branch Naming Conventions

Team standard:

feature/description - new features
fix/description - bug fixes
docs/description - documentation
refactor/description - code cleanup
test/description - adding tests

Everyone follows the same pattern. Easy to understand at a glance.

Handling Large Teams

Code Owners

Create .github/CODEOWNERS:

# Every PR needs approval from these owners

# Default owners for everything
* @team-lead @senior-dev

# Specific owners for parts
/scrapers/ @scraping-team
/database/ @backend-team
/docs/ @documentation-team

GitHub auto-assigns reviewers based on files changed.

Draft Pull Requests

Create PR that isn't ready yet:

Create PR
Select "Draft pull request"
Keep working
Mark "Ready for review" when done

Shows progress without spamming reviewers.

PR Templates

Create .github/pull_request_template.md:

## Summary
<!-- What does this PR do? -->

## Changes
<!-- List main changes -->

## Testing
<!-- How was this tested? -->

## Checklist
- [ ] Code works locally
- [ ] Tests pass
- [ ] Documentation updated
- [ ] No merge conflicts

Every new PR pre-fills with this template.

Summary

Collaboration is about process, not just tools.

Contributing to open source:

Fork the repo
Clone your fork
Create feature branch
Make changes
Push to your fork
Create PR to original
Respond to feedback
Celebrate when merged

Working in teams:

Clone shared repo
Always pull before starting
Create feature branch
Make changes
Push branch
Create PR
Get review approval
Merge
Delete branch
Pull updated main

Key tools:

Forks (your copy of someone's project)
Branches (parallel development)
Pull requests (propose changes)
Code review (quality control)
Issues (track work)
Projects (organize work)
.gitignore (exclude files)

Best practices:

Small, focused PRs
Clear commit messages
Responsive to feedback
Test before pushing
Review your own code
Pull before merging
Communicate decisions

Collaboration rules:

Never push directly to main
Always use branches
Always get review approval
Pull before starting work
Keep PRs small and focused

Git and GitHub handle the mechanics. You handle the communication. Both matter equally.

Next up: Blog 5 - "Advanced Git: The Commands That Save Your Career"

The final blog in the series. We'll cover the scary stuff: recovering deleted commits, rewriting history (carefully), handling committed secrets, git reset vs revert vs rebase, stashing changes, cherry-picking commits, and the advanced commands that fix seemingly impossible situations.

Resources:

GitHub Skills: https://skills.github.com
Open source guide: https://opensource.guide/how-to-contribute/
Conventional commits: https://www.conventionalcommits.org/
Code review best practices: https://google.github.io/eng-practices/review/