A deep dive into .gitignore for Python projects — the secrets pattern, the template exception, what belongs in version control and what doesn't, and how one missing line can cost you real money.
.gitignore Done Right
A developer at a startup pushed their .env file to a public GitHub repository by mistake.
Within 4 minutes — automated bots had scraped it.
Within 6 minutes — they were making API calls on his account.
His bill at the end of the month: $340.
One missing line in .gitignore caused this.
This article covers everything about .gitignore for Python projects — what to ignore, why each category exists, and the pattern every production codebase uses.
What .gitignore Actually Does
.gitignore tells Git: "these files and folders exist on my machine — never track them, never commit them, never include them in diffs or pull requests."
Two reasons you need it:
Reason 1 — Security
Your .env file contains:
- Database passwords
- API keys (OpenAI, Groq, Anthropic)
- JWT signing secrets
- AES encryption keys
If any of these reach GitHub — automated bots scan public repositories continuously. They find keys, abuse them, and you receive a bill. This is not theoretical. It happens every day.
Reason 2 — Repository hygiene
Some files are auto-generated on your machine and serve no purpose in the repository:
-
__pycache__/— Python bytecode, machine-specific, regenerates automatically -
.venv/— your virtual environment, thousands of files each developer creates themselves -
.DS_Store— macOS filesystem metadata, meaningless to other developers -
.idea/— IDE configuration, personal and machine-specific
These files change constantly, cause merge conflicts, and bloat the repository for zero benefit.
The Most Important Pattern — Secrets
# ============================================================
# SECRETS — NEVER commit these
# ============================================================
.env
.env.*
!.env.example
Let's break down each line:
.env — Your real secrets file. Contains actual API keys, passwords, database URLs. Lives only on your machine. Never committed.
.env.* — Covers all environment variants: .env.local, .env.development, .env.production, .env.staging. One pattern ignores all of them.
!.env.example — The ! prefix means exception. Despite the previous rules, this file IS tracked by git.
Why commit .env.example?
.env.example is the template. It has the same variable names as .env but with placeholder values — no real secrets.
# .env.example — committed to git
DATABASE_URL=postgresql+asyncpg://docqa:docqa_password@localhost:5432/docqa
SECRET_KEY=replace-with-64-char-hex-generate-with-openssl-rand-hex-32
GROQ_API_KEY=your-groq-api-key-here
OPENAI_API_KEY= # optional — leave empty if not using
# .env — never committed — has real values
DATABASE_URL=postgresql+asyncpg://docqa:myRe4lP@ss@prod.db.internal:5432/docqa
SECRET_KEY=a3f8c2e1d4b7f9a2c5e8d1b4f7a2c5e8d3f6b9a2c5e8d1b4f7a2c5e8d3f6b9a
GROQ_API_KEY=gsk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
When someone clones the repository:
cp .env.example .env
# fill in real values
Zero guessing about what variables are needed. Zero Slack messages asking "what env vars do I need?" Zero setup friction.
A professional .env.example includes comments:
# JWT signing key — if stolen, anyone can forge login tokens for any user
# Generate with: openssl rand -hex 32
# Must be at least 32 characters — app refuses to start otherwise
SECRET_KEY=replace-with-64-char-hex-string
# AES-256-GCM encryption key for LLM API keys stored in DB
# MUST be different from SECRET_KEY
# Generate with: openssl rand -hex 32
ENCRYPTION_KEY=replace-with-another-64-char-hex-string
# Default LLM provider — completely free, no credit card needed
# Get your key at: https://console.groq.com
GROQ_API_KEY=your-groq-api-key-here
Each comment tells: what the variable does, why it matters, how to generate it, where to get it. Future developers (including future you) will thank present you.
Python-Specific Patterns
Bytecode
__pycache__/
*.py[cod]
*.pyo
.Python
When Python runs your .py file, it compiles it to bytecode and stores it in __pycache__/. This happens automatically every time you run your code.
Why ignore it?
- Machine-specific — your compiled bytecode won't work on someone else's machine
- Regenerates automatically — no value in tracking it
- Changes constantly — causes meaningless merge conflicts
The *.py[cod] pattern covers .pyc (compiled), .pyo (optimized), and .pyd (Windows DLL) in one rule.
Virtual Environment
venv/
.venv/
env/
ENV/
.env/
Your virtual environment contains thousands of files — every package you've installed. Each developer creates their own virtual environment. The requirements.txt file is what gets committed — that's the contract. The actual installed packages stay local.
Multiple folder names covered because different tools create differently named environments (python -m venv vs virtualenv vs poetry).
Testing and Coverage
.pytest_cache/
.coverage
coverage.xml
htmlcov/
.tox/
Test runners generate cache and coverage files. .pytest_cache/ speeds up test runs locally but is meaningless in the repository. Coverage reports are generated artifacts — they should be generated fresh rather than committed.
Type Checking
.mypy_cache/
.ruff_cache/
Type checkers cache their analysis for performance. Like bytecode, these are machine-specific and regenerate automatically.
IDE and Editor Files
.vscode/
.idea/
*.swp
*.swo
.DS_Store
Thumbs.db
IDE configuration is personal. Your VS Code settings for font size, color theme, and key bindings are not relevant to other developers. Your IntelliJ project structure is machine-specific.
Exception: Some teams commit a .vscode/ folder with shared extension recommendations. If you do this deliberately, use !.vscode/extensions.json to re-include just that file.
*.swp and *.swo are temporary files created by Vim.
.DS_Store is macOS metadata about folder display preferences.
Thumbs.db is Windows thumbnail cache.
None of these belong in a repository.
Uploads and User Data
uploads/
*.pdf
*.docx
*.pptx
*.csv
*.xlsx
User-uploaded documents are not source code. In production, they go to cloud object storage (Cloudflare R2, AWS S3, Google Cloud Storage) — not the server disk and definitely not git.
Including these patterns prevents accidentally committing a test PDF during development. It also protects against committing user data that might contain sensitive information.
Logs
*.log
logs/
Logs are runtime output. They change constantly, are often large, and contain information (user IDs, IP addresses, request data) that shouldn't be in version control.
Build Artifacts
dist/
build/
*.egg-info/
These are generated when you package your Python application for distribution. Like bytecode, they're build outputs — not source code.
Docker
.docker/
Some Docker tools create a .docker/ directory with local state. This stays local.
The Complete .gitignore for Python AI Projects
Here's the full file with comments explaining each section:
# ============================================================
# SECRETS — NEVER commit these
# ============================================================
.env
.env.*
!.env.example
# ============================================================
# Python bytecode
# ============================================================
__pycache__/
*.py[cod]
*.pyo
.Python
# ============================================================
# Virtual environment
# ============================================================
venv/
.venv/
env/
ENV/
.env/
# ============================================================
# Testing and coverage
# ============================================================
.pytest_cache/
.coverage
coverage.xml
htmlcov/
.tox/
# ============================================================
# Type checking and linting cache
# ============================================================
.mypy_cache/
.ruff_cache/
# ============================================================
# IDE and editor files
# ============================================================
.vscode/
.idea/
*.swp
*.swo
.DS_Store
Thumbs.db
# ============================================================
# Docker
# ============================================================
.docker/
# ============================================================
# Uploads and user data
# Never commit user files — they go to cloud storage
# ============================================================
uploads/
*.pdf
*.docx
*.pptx
*.csv
*.xlsx
# ============================================================
# Logs — runtime output, not source code
# ============================================================
*.log
logs/
# ============================================================
# Build artifacts
# ============================================================
dist/
build/
*.egg-info/
# ============================================================
# Node (for future React frontend)
# ============================================================
node_modules/
.next/
frontend/dist/
frontend/build/
What Happens If You Commit a Secret Accidentally
If you commit a .env file or any file containing secrets to a public repository — even briefly — assume the secret is compromised.
Step 1 — Remove from git tracking:
# Remove the file from git tracking (keeps it on disk)
git rm --cached .env
# Add to .gitignore
echo ".env" >> .gitignore
# Commit the removal
git commit -m "remove .env from git tracking"
# Push
git push
Step 2 — Rotate every secret immediately.
Do not wait. Do not hope "nobody saw it". Bots scan constantly.
For API keys — go to the provider dashboard and regenerate.
For database passwords — change the password.
For JWT secrets — rotate the key (this invalidates all existing tokens — users must log in again).
Step 3 — Clean git history (if the repository is public):
The secret is still in git history even after removing the file. To fully remove it:
# Use git-filter-repo (the modern tool — replaces BFG Repo Cleaner)
pip install git-filter-repo
git filter-repo --path .env --invert-paths
git push --force --all
⚠️ Force-pushing rewrites history. Coordinate with your team before doing this.
Checking Your Current Status
To verify your .gitignore is working:
# See what git is currently tracking
git ls-files
# Check if a specific file would be ignored
git check-ignore -v .env
# output: .gitignore:2:.env .env
# means: rule on line 2 of .gitignore matches .env — it's ignored ✅
# If the file is already tracked (check before adding to .gitignore)
git status
# If .env shows up — remove it with: git rm --cached .env
The Rule to Remember
.env → real secrets — NEVER committed
.env.* → all variants — NEVER committed
!.env.example → the exception — ALWAYS committed
The ! exception is the pattern every production team uses.
The .env.example template is what makes onboarding painless.
The comments in .env.example are what make maintenance painless.
Next Article
Article 3: Managing Environment Variables in Python — the full journey from os.getenv() scattered across files, to Pydantic Settings v2 with typed validation, fail-fast startup, and @lru_cache for immutable configuration.
This article is part of a series on Building a Production AI Platform From Scratch. Full code at: [Once repo is cleaned] Soon
Top comments (0)