Originally published at kunalganglani.com — read it there for inline code, hero image, and live links.
I rebuilt every Python AI project I maintain this past year. All of them. And the single biggest improvement wasn't a new model or a better framework. It was setting up Python for professional AI development properly: replacing the duct-taped mess of pip, venv, and scattered config files with a stack that actually works when you have more than three dependencies and two contributors.
Most tutorials still teach venv and requirements.txt. That approach was fine for a Flask app in 2019. For an AI agent with PyTorch, transformers, LangChain, and a dozen other heavy dependencies? It falls apart fast. Here's the complete environment setup I now use for every serious AI project, and why each piece earned its spot.
The 30-Second Version
Your Python AI project probably still uses pip, requirements.txt, and a scattered collection of config files. That stack was fine in 2020 — it's painful in 2026. A single Rust-powered tool called uv now replaces seven separate Python tools, resolves dependencies 30x faster than Poetry, and pairs with pyproject.toml to give you one config file for everything. This guide walks through the professional stack that actually scales: uv for package management, pyproject.toml for configuration, Ruff for linting, type checking for LLM code, and CI/CD that doesn't break every sprint.
Why Most Python AI Development Setups Break at Scale
Here's the thing nobody's saying about Python environment management: the tooling fragmentation is the root cause of most onboarding pain in AI teams.
Think about what a typical AI project required in 2024. You needed pip, virtualenv (or venv), pyenv for version management, pip-tools for lockfiles, Black for formatting, Flake8 for linting, isort for imports, and maybe Poetry if someone on the team cared enough. That's seven or eight separate tools. Each with its own config file. Each with its own update cycle. Each with its own way of breaking at the worst possible time.
I've watched this pattern destroy team productivity firsthand. A new engineer joins, spends two days getting their environment working, hits a dependency conflict between PyTorch and some tokenizer library, and by day three they're questioning their career choices. In my experience building AI agents and RAG pipelines, environment setup is the single biggest source of wasted engineering hours on AI teams. Not model tuning. Not data quality. Environment setup.
The old approach — python -m venv .venv, pip install -r requirements.txt, hope for the best — fundamentally doesn't work for AI workloads. AI dependencies are massive (PyTorch alone is 2GB+). Version conflicts are constant (CUDA versions, transformer library versions, tokenizer binaries). And reproducing the same environment across macOS, Linux, and CI/CD runners is nearly impossible without a proper lockfile.
The stack I'm about to walk through solves all of this. I've shipped it across multiple projects, and it's what the best AI teams at companies like Hugging Face, FastAPI, and Apache Airflow are already running.
How to Set Up Python for AI Development With uv: The Tool That Replaced Everything
If you haven't heard of uv yet, here's the short version: it's a single Rust-powered binary that replaces pip, pip-tools, pipx, Poetry, pyenv, twine, and virtualenv. Seven tools. One binary. And it's not a convenience wrapper — it's genuinely, measurably faster by an absurd margin.
Charlie Marsh, Founder and CEO of Astral, published benchmarks showing uv is 8-10x faster than pip without caching and 80-115x faster with a warm cache. For AI projects with their heavy dependency trees, the numbers get wilder. Resolving the Transformers project with all optional dependencies takes 7.48 seconds with uv versus 47.91 seconds with Poetry and 91.91 seconds with PDM on a cold cache. With a warm cache? uv resolves it in 0.14 seconds. Poetry: 4.32 seconds. PDM: 58.61 seconds.
Those aren't typos. uv is roughly 30x faster than Poetry for real-world AI dependency resolution.
With 87,000+ GitHub stars since its February 2024 launch, uv is one of the fastest-growing developer tools in Python history. Here's how I set up every new AI project with it:
-
Install uv — a single curl command, no Rust or Python required:
curl -LsSf https://astral.sh/uv/install.sh | sh -
Install Python —
uv python install 3.12(uv manages Python versions directly, so you can ditch pyenv) -
Initialize your project —
uv init my-ai-project(creates pyproject.toml, .python-version, and a src layout) -
Add dependencies —
uv add torch transformers langchain(resolves, locks, and installs in one step) -
Run your code —
uv run python train.py(automatically uses the project's virtual environment) -
Lock for reproducibility —
uv lock(generates a cross-platform lockfile) -
Sync on another machine —
uv sync(installs exact locked versions)
The key insight: uv treats the virtual environment as a disposable artifact of the lockfile. You never activate it manually. You never think about it. uv run handles everything. This is how it should have always worked.
[YOUTUBE:Y21OR1OPC9A|Python Virtual Environments - Full Tutorial for Beginners]
Tech With Tim's viral tutorial on Python virtual environments covers the fundamentals well, but the professional stack goes well beyond venv basics.
uv vs Poetry for AI Projects: When Each Still Makes Sense
I get asked this constantly: should I switch from Poetry to uv? Short answer: yes, for most AI projects. But let me be specific about why, and where Poetry still holds up.
| Feature | uv | Poetry 2.0 |
|---|---|---|
| Cold resolve (Transformers) | 7.48s | 47.91s |
| Warm resolve (Transformers) | 0.14s | 4.32s |
| Python version management | Built-in | Requires pyenv |
| Lockfile format | Cross-platform universal | Poetry-specific |
| Workspace support | Cargo-style monorepo | Limited |
| PEP 621 compliance | Native | Added in 2.0 |
| Plugin ecosystem | Growing | Mature |
| Language | Rust (single binary) | Python |
| Tool replacement scope | 7 tools | 3-4 tools |
Poetry 2.0 made real progress. It shifted to PEP 621-compliant project.dependencies in pyproject.toml, aligning with the broader ecosystem. As Philipp Acsany documented in his thorough Real Python tutorial, Poetry remains strong for library authors who need a mature plugin ecosystem and established publishing workflows.
But for AI application development — building agents, RAG pipelines, training jobs, inference services — uv wins decisively. The speed difference alone changes how you work. When uv sync takes under a second instead of 30 seconds, you stop batching dependency changes and start iterating freely. That matters when you're swapping between different LLM providers and testing model configurations.
uv also supports Cargo-style workspaces. This is a real differentiator for AI monorepos. If you have separate packages for your training pipeline, your serving API, and your data ingestion layer — all sharing a common core library — uv manages them with a single root-level lockfile. Having worked with teams building multi-component AI systems, I can tell you this alone saves hours of dependency hell per sprint.
pyproject.toml: One File to Rule Your Entire AI Project
pyproject.toml is now the official PyPA-endorsed standard for Python project configuration, superseding setup.py and setup.cfg. If you're still maintaining a setup.py, a requirements.txt, a .flake8, a .isort.cfg, a mypy.ini, and a pytest.ini — that's six config files that should be one. I've inherited projects with even more. It's miserable.
Here's what a well-structured pyproject.toml looks like for an AI project:
Your [project] section declares your metadata and runtime dependencies — torch, transformers, langchain, whatever your AI stack requires. Your [project.optional-dependencies] section groups dev tools separately: dev = ["pytest", "ruff", "mypy"]. Your [tool.ruff] section configures linting and formatting. Your [tool.mypy] or [tool.pyright] section handles type checking. Your [tool.pytest.ini_options] section configures tests. One file. Version-controlled. Readable by every tool in the ecosystem.
The specific practices I follow for AI projects:
-
Pin major versions loosely, patch versions tightly for AI libraries.
torch >= 2.4, < 3.0lets you get security patches without breaking CUDA compatibility. -
Use dependency groups (PEP 735) to separate training dependencies from inference dependencies. Your production serving container doesn't need
tensorboard,wandb, andjupyter. -
Declare your Python version constraint explicitly:
requires-python = ">= 3.11". AI libraries drop old Python versions aggressively, and you want to catch this in resolution, not at runtime. - Put all tool configuration in pyproject.toml. Ruff, mypy, pytest, everything. Zero standalone config files.
This pairs perfectly with uv. When you run uv add torch, it updates pyproject.toml and regenerates the lockfile in one atomic operation. No manual editing. No forgetting to update the lockfile and then wondering why CI is broken.
Ruff: The Linter and Formatter Your AI Codebase Needs
If uv is the most important new tool for Python packaging, Ruff is the most important new tool for Python code quality. Both come from Astral, the same team, and they're designed to work together.
Ruff replaces Flake8, Black, isort, pydocstyle, pyupgrade, and autoflake. All in one Rust-powered binary. Nick Schrock, founder of Elementl and co-creator of GraphQL, measured Ruff scanning a 250,000-line codebase (Dagster) in 0.4 seconds. pylint took approximately 2.5 minutes on the same code across four CPU cores. That's roughly 1,000x faster.
For AI codebases, this speed matters more than you'd think. AI code tends to be messy. You're prototyping in notebooks, converting to scripts, dealing with sprawling data processing functions, and juggling model configuration files. Having a linter that runs in under a second means you can run it on every save without breaking your flow.
Ruff has been adopted by Apache Airflow, FastAPI, Hugging Face, Pandas, and SciPy — basically the entire Python AI and data ecosystem. As Sebastián Ramírez, creator of FastAPI, has endorsed, the tool has become the de facto standard for Python linting in production codebases.
My Ruff config in pyproject.toml for AI projects is minimal but opinionated: enable the E, F, I, UP, B, and SIM rule sets. Set line length to 100 (AI code has long variable names — tokenized_input_embeddings doesn't fit in 79 characters, and I'm tired of pretending it does). Enable auto-fix for import sorting. Done.
The Astral toolchain — uv for packaging, Ruff for code quality — is converging into something that feels like what Python should have shipped with from the start. If you're building production AI systems, adopting both is the single highest-leverage tooling decision you can make right now.
Type Checking for LLM Code: Why mypy and Pyright Are Non-Negotiable
This might be my most controversial opinion in this whole post: if you're building AI agents or LLM applications without type checking, you're writing bugs faster than you're writing features.
LLM code is especially prone to type-related bugs because the data flowing through it is inherently loosely structured. API responses from OpenAI, Anthropic, or local models come back as nested dictionaries. Prompt engineering templates mix strings with structured data. Function calling schemas need to match your Python function signatures exactly. RAG pipelines pass around chunks, embeddings, and metadata that might be lists, dicts, or custom objects depending on which library you're using.
I've shipped enough features to know that type checking catches entire categories of bugs that unit tests miss. Especially around None handling, incorrect dictionary key access, and mismatched function signatures between your agent's tools and the LLM's expected schema. These are the bugs that only show up at 2am when a user sends an input you didn't anticipate.
You have two solid choices: mypy (the established standard) and Pyright (Microsoft's faster alternative, used by VS Code's Pylance). Both configure through pyproject.toml. For AI projects, I lean toward Pyright for day-to-day development because it's faster and integrates natively with VS Code, but I run mypy in CI because it has broader ecosystem support.
Key type-checking practices for AI code:
-
Type your LLM response handlers. Don't pass around raw
dict[str, Any]. Create Pydantic models or TypedDicts for every API response shape. -
Use
Protocolclasses for tool interfaces. When your agent can call multiple tools, define the tool interface as a Protocol so the type checker validates every implementation. -
Type your prompt templates. If a function builds a prompt, its arguments should be typed, not
**kwargs. -
Set
strict = truein your type checker config. Painful for the first week. Saves you from production bugs for the next year.
After shipping several agent-based systems, I can tell you that teams running strict type checking have dramatically fewer runtime errors in production than those relying on tests alone. It's not even close.
Jupyter Notebooks vs Python Scripts: Use the Right Tool
The Jupyter vs. scripts debate is exhausting because people treat it as an either/or. Both are tools. Use the right one for the job.
Use Jupyter notebooks for:
- Exploratory data analysis and dataset inspection
- Prototyping prompt chains and evaluating LLM outputs interactively
- Visualizing training metrics and model performance
- Documenting research experiments with inline charts
- Quick API testing against new model providers
Use Python scripts and modules for:
- Anything that runs in production — AI agents, serving endpoints, data pipelines
- Anything that gets tested — unit tests, integration tests, CI/CD validation
- Anything that multiple people edit — notebooks create merge conflicts that are genuinely unsolvable
- Training pipelines that run on remote GPUs
- Agent orchestration and multi-agent systems — these need proper module structure
The pattern I follow: prototype in a notebook, then extract the working code into typed Python modules. The notebook becomes documentation. The modules become the product.
uv makes this workflow smooth. You can run uv run jupyter lab to launch Jupyter within your project's managed environment — no separate kernel installation, no ipykernel manual setup. And because uv now integrates with Jupyter and marimo, your notebook automatically has access to the exact same locked dependency set as your scripts.
One thing I've learned the hard way: never put secrets, API keys, or model weights paths in notebooks. They end up in Git history. They end up on conference talk slides. I've seen it happen. Use environment variables loaded through a .env file, and add *.ipynb output cells to your .gitignore pre-commit hooks.
CI/CD for Python AI Projects: Making It Actually Work
Most CI/CD pipelines for Python projects are slow, flaky, and expensive. AI projects make this worse because the dependencies are enormous (a full PyTorch + transformers install can take 5-10 minutes with pip) and the test suites often need GPU access or API keys.
uv changes the CI equation dramatically. The official astral-sh/setup-uv GitHub Action installs uv, manages Python version matrices, and persists dependency caches — all without requiring a separate Python installation step. Here's what a professional CI pipeline looks like:
-
Install uv via
astral-sh/setup-uv(pin to a specific version like v8.1.0) - Cache dependencies — uv's cache is persistent and cross-platform, so subsequent runs install in seconds
-
Run linting —
uv run ruff check .anduv run ruff format --check . -
Run type checking —
uv run mypy src/oruv run pyright -
Run tests —
uv run pytestwith appropriate markers to skip GPU-dependent tests in CI -
Build and publish —
uv buildanduv publishfor library projects
The speed difference is real. A CI pipeline that took 8 minutes with pip + Poetry now takes under 2 minutes with uv. When your team is pushing 20+ PRs a day on an active AI project, that saves over two hours of cumulative CI wait time daily. I've seen engineers start running CI more frequently just because it stopped being annoying.
For AI-specific CI concerns:
- Separate your test tiers. Unit tests (no API calls, no GPU) run on every push. Integration tests (real API calls to LLM providers) run on merge to main. Training validation tests run on a schedule.
- Mock LLM responses in unit tests. Don't burn API credits in CI. Record real responses once, replay them in tests.
- Use dependency groups to install only what CI needs. Your linting job doesn't need PyTorch. Your type checking job doesn't need test fixtures.
-
Pin your Python version in
.python-versionand reference it in CI. uv respects this file automatically.
I've written about how vibe coding tools accelerate development, but without solid CI/CD, that speed just means you ship bugs faster. The environment stack I've laid out here — uv, Ruff, type checking, structured tests — is the safety net that makes rapid AI development sustainable.
The Professional Python AI Development Stack: Putting It All Together
Let me be concrete about the complete stack and how the pieces connect. This is what I install and configure on day one of every new AI project:
| Layer | Tool | What It Replaces |
|---|---|---|
| Package management | uv | pip, pip-tools, pipx, virtualenv, pyenv, Poetry |
| Project config | pyproject.toml | setup.py, setup.cfg, requirements.txt, MANIFEST.in |
| Linting + formatting | Ruff | Flake8, Black, isort, pyupgrade, autoflake |
| Type checking | Pyright (dev) + mypy (CI) | N/A (previously skipped) |
| Testing | pytest | unittest |
| CI/CD | GitHub Actions + setup-uv | Manual pip install scripts |
| Prototyping | Jupyter Lab (via uv run) |
Standalone Jupyter install |
| Python versions | uv python install |
pyenv, conda, system Python |
Total number of standalone config files: one (pyproject.toml, plus .python-version which is a single line). Compare that to the 6-8 config files the old stack demanded.
Total number of tools to install: one (uv). Everything else is a project dependency managed through pyproject.toml.
This is the setup I use for building everything from WhatsApp AI agents to local LLM inference pipelines, and it's what I recommend to every team I work with. The consistency eliminates an entire class of "works on my machine" problems. The speed of uv means environment management disappears as a friction point entirely.
If you're building AI applications with frameworks like LangChain or Pydantic AI, using coding tools like Claude Code or Cursor, or deploying to cloud platforms — this stack works everywhere. This is one of those things where the boring answer is actually the right one.
What This Stack Means for What Comes Next
Here's what caught me off guard: Python AI tooling consolidated way faster than I expected. In February 2024, uv didn't exist. By mid-2026, it has 87,000+ GitHub stars and the Astral toolchain (uv + Ruff) is the de facto standard for serious Python work. The fragmented era of pip + virtualenv + pyenv + Black + Flake8 + isort is over for greenfield projects.
My prediction: within 12 months, uv init will be how the majority of new Python AI projects start. The requirements.txt file will join setup.py in the "legacy compatibility" bucket — still supported everywhere, used by choice almost nowhere.
If you're still setting up Python AI projects the old way, stop. Seriously. The migration cost is an afternoon. The productivity gain compounds every single day. Install uv, create a pyproject.toml, configure Ruff, enable type checking, wire up CI. That's your afternoon. Then get back to building the AI system that actually matters.
The tools are finally good enough. Your environment shouldn't be the hard part anymore.
Frequently Asked Questions
Is uv stable enough for production Python AI projects?
Yes. uv has been stable since mid-2025 and is used by major projects including FastAPI, Hugging Face, and Apache Airflow. With 87,000+ GitHub stars and backing from Astral (the same team behind Ruff), it has more active development and community support than most alternatives. Pin to a specific version in CI for extra safety.
Can I migrate from Poetry to uv without breaking my existing project?
uv can import Poetry's pyproject.toml format and generate its own lockfile from your existing dependency declarations. The migration is typically a one-command process: run uv lock in your project directory and uv will resolve everything from your existing pyproject.toml. You may need to adjust a few Poetry-specific fields, but the core dependencies carry over cleanly.
Do I still need virtual environments when using uv?
uv creates and manages virtual environments automatically — you just never interact with them directly. When you run uv run python script.py, uv ensures the correct environment is active with the correct dependencies. You don't activate, deactivate, or think about virtual environments. They're an implementation detail, not a workflow step.
Should I use mypy or Pyright for type checking Python AI code?
Both are solid choices. Pyright is faster and integrates natively with VS Code through Pylance, making it excellent for real-time feedback during development. mypy has broader ecosystem support and more established community conventions. Many teams use Pyright locally and mypy in CI to get the best of both worlds.
What Python version should I use for AI development in 2026?
Python 3.12 is the current sweet spot — it has the best balance of library compatibility, performance improvements, and modern language features. Python 3.13 works for most use cases but some AI libraries lag on support. Avoid Python 3.10 or older for new projects; major AI frameworks are dropping support for them.
How does uv handle large AI dependencies like PyTorch?
uv uses a global module cache with Copy-on-Write and hardlinks, so PyTorch's 2GB+ installation is stored once on disk regardless of how many projects use it. Combined with its Rust-powered resolver, uv installs PyTorch significantly faster than pip — especially on subsequent installs where the cache is warm and installation can complete in under a second.
Originally published on kunalganglani.com
Top comments (0)